What are adversarial examples? Do they exist for humans?

Published in

Towards Data Science

6 min readAug 18, 2017

Last month I was thinking about adversarial examples for CNNs.

Adversarial example — is when you change several pixels in the image of the dog and classifier recognizes a modified image as a shovel.

Image taken from https://www.youtube.com/watch?v=Iwei8Lah0h8

Despite the various explanation of their nature and existence, there is still no good “defense” against. Google Brain even run 3 Kaggle competitions devoted to adversarial examples.
Let`s step out of complex math behind, and think what adversarials ARE in layman terms, and what they are not.

Filling image with noise is not an adversarial example, although it is hard to classify it. Image from https://www.semanticscholar.org/paper/Spatially-adaptive-Total-Variation-image-denoising-Rojas-Rodriguez/58df7233d49ea3908483cb1267057ba62158aed1

First, not all modifications of input images which lead to misclassification, are “adversarial”. For example, you can fill in the image with salt-and-pepper noise, so it would be hard to see, what is actually depicted there.

Second, one could just blend two half-transparent images and have only one “true” label.

Baboon and Lena blended. Image from https://stackoverflow.com/questions/12242443/how-do-i-blend-two-textures-with-different-co-ordinates-in-opengl-es-2-0-on-ipho

While, again, this could lead to misclassification, it is quite obvious that here are TWO images. It is somehow similar to ask network “Have you stopped drinking cognac in the morning? Yes or no?”.

Another example is popular now Laurel-vs-Yanni sound, as suggested by Oudeicrat Annachrista
https://www.theguardian.com/global/video/2018/may/16/what-do-you-hear-in-this-audio-clip-yanny-or-laurel-takes-internet-by-storm-video
If you catch high frequencies, that is one, low — another.

What is usually understood as an adversarial example, is the image (or another signal), which difference to a original image is imperceptible to human AND (sometimes OR) give human no difficulty to be correctly recognized. First image of this post is good illustration for this.
Funny enough, when competing against CNN on ImageNet, Andrej Karpathy was not far from the truth, when wrote:

In principle, all of the misclassified images could have had a barely noticeable purple pixel on the top left and this is the reason for all mistakes.

Next, why or? Well, recent work used stickers “LOVE” and “HATE” to fool a sign detection system. While this is definitely human-perceptible difference to clean up the “STOP” sign, such images would not fool human from a close distance.

Image from https://arxiv.org/pdf/1707.08945.pdf

Actually, I would add the third class of adversarials — rotated images. The recognition rate for the rotated object drops significantly, while not hurting person performance that much.

So, why is it so easy to fool deep learning based classifier?
My feeling is that networks learn wrong features.

Update: 3 day after this post, paper “Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples” appeared. It is based exactly on the same assumption and use adversarials to features visualization.

Update2: Recent work There Is No Free Lunch In Adversarial Robustness
(But There Are Unexpected Benefits showed that networks, which were trained to resist adversarial attacks learn indeed very different features. So “adversarial” images looks like images of different class even for human — as it should be:

“Adversarial” images for robust network looks really similar to classes they are pretending to be. Img. from https://arxiv.org/pdf/1805.12152v2.pdf E.g. “attacked” airplane really looks like bird (topright image)

And it is not only deep learning problem. Recently, stackoverflow posted survey results, which clearly show that developers, who indent with spaces make more money, that those who use tabs.
Stackoverflow is definitely quite a representative community, so there is no problem with a data. Will you make more money if switch to the spaces? Common sense tells you, that quite unlikely. But “space” is definitely “good” feature for a classifier. And this definitely helps us to fool the model based on such feature.
So what why spaces are so important? Evelina Gabasova did a research and have an explanation:

I’m quite convinced that the difference in salaries of tab and space users is mainly due to the type of company and the environment they work in. Environments where people use Git and contribute to open source are more associated both with higher salaries and spaces, rather than with tabs.

There are two types of features, which help classification. First are causal, a real ones. For a car, real feature would be everything, which gives it physical ability to go. For a plane, it would be wings, general shape and engines. They could be occluded on the particular photo, but changing several pixels cannot really define their existence (unless whole plane takes 5 x 5 pixels).

“STOP” sign is a red octagon with a white border and white letters “STOP” on it. Graffity, if not fully covering the sign, does not change it. However, if you detect STOP using values of several pixels, which works perfectly in training set, then you are in trouble. Your model coincides with reality only for a limited subset of images.

This values of several pixels (and space-based-indents) are examples of the second type features — indicators. They are easy to measure and usually they are very correlated with actual causal features. Why? Well, because, that is how they are constructed. Instead of measuring real causal things, which could be hard to define, we measure things, which are easy to measure and work (for the most cases).

Here adversarials for human comes. How do you know, if this person is successful? Well, if she has luxury car and cloth, she might be rich. If a person wears pilot uniform, she is probably a pilot. If a waiter tells you that the most expensive dish is not so good today, he is a honest person.

Adversarial for human: despite the uniform, Leo is not a pilot. “Catch me if you can”

So fraud knows this and use. Look at the “success coaches” or any good movie about frauds.

What is common about common life fraud recognition and CNN-based learning, that both networks for images and humans for people have a little data to obtain non-overfitted to personal experience model. And when we can actually check (especially for images :) something by practice, it is hard for network to get the physical model of the world to check what makes plane a plane.

So I am a bit skeptical about defense against the adversarial attacks until we combine image recognition engine with some physical and/or causal model. Possibly a learned one .The same for “low-pass filter based” ones. Until the features would be just indicators, one will be able to design an attack to fool the classifier.

P.S. Another way, which may work — to give a network possibility to give “I have never experienced such before. I don`t know” answer. This will help to avoid “black hearts” situation from Interstate 60.

Written by Dmytro Mishkin