Countering Adversarial Images Using Input Transformations

From statwiki
Revision as of 02:28, 15 November 2018 by Skoundin (talk | contribs)
Jump to: navigation, search


As the use of machine intelligence has increased , robustness has become a critical feature to guarantee the reliability of deployed machine-learning systems. However, recent research has shown that existing models are not robust to small , adversarial designed perturbations of the input. Adversarial examples are inputs to Machine Learning models that an attacker has intentionally designed to cause the model to make a mistake.The adversarial examples are not specific to Images , but also Malware, Text Understanding ,Speech. Below example (Goodfellow et. al), a small perturbation when applied to original image of panda, the prediction is changed to gibbon.


Hence an urgent need for approaches/defenses that increase the robustness of learning systems to such adversarial examples.


The paper studies strategies that defend against adversarial-example attacks on image-classification systems by transforming the images before feeding them to a Convolutional Network Classifier. Generally, defenses against adversarial examples fall into two main categories -

1. Model-Specific – They enforce model properties such as smoothness and in-variance via the learning algorithm.

2. Model-Agnostic – They try to remove adversarial perturbations from the input.

This paper focuses on increasing the effectiveness of Model Agnostic defense strategies. Below image transformations techniques have been studied:

1. Image Cropping and Re-scaling ( Graese et al, 2016).

2. Bit Depth Reduction (Xu et. al, 2017)

3. JPEG Compression (Dziugaite et al, 2016)

4. Total Variance Minimization(RUdin at al , 1992)

5. Image Quilting (Efros & Freeman , 2001).

These image transformations have been studied against Adversarial attacks such as - fast gradient sign method(Kurakin et al., 2016a), Deepfool (Moosavi-Dezfooli et al., 2016), and the Carlini & Wagner (2017) attack. From the experiments the strongest defenses are based on Total Variance Minimization and Image Quilting: as these defenses are non-differentiable and inherently random which makes difficult for an adversary to get around them.

Problem Definition/Terminology

Gray Box Attack : Model Architecture and parameters are Public

Black Box Attack : Adversary does not have access to the model.

Non Targeted Adversarial Attack : Goal of the attack is to modify source image in a way that image will be classified incorrectly by Machine Learning Classifier

Targeted Adversarial Attack : Goal of the attack is to modify source image in way that image will be classified as a specific target by Machine Learning Classifier.

The paper discusses non- targeted adversarial example for image recognition systems. Given image space X , and a classifier h(.) , and a source image x ∈ X , a non targeted adversarial example of x is a perturbed image x'∈ X , such that h(x) ≠ h(x'). Given a set of N images {x1, …xn} , a target classifier h(.) , an adversarial attack aims to generate { x{1},…..x'{n}}, such that (x'n)is an adversary of xn.

Success rate of an attack is given as:


which is the proportions of predictions that were altered by an attack.

Success Rate is generally measured as a function of the magnitude of perturbations performed by the attack , using normalized L2-dissimilarity. : diss.png

A strong adversarial attack has a high rate , while its normalized L2-dissimilarity given by the above equation is less.

Defense : A defense is a strategy that aims make the prediction on an adversarial example h(x') equal to the prediction on the corresponding clean example h(x).

Adversarial Attacks

For the experimental purposes, below 4 attacks have been studied.

1. Fast Gradient Sign Method (FGSM; Goodfellow et al. (2015)): Given a source input x, and true label y, and let l be the differentiable loss function used to train the classifier h(.). Then the corresponding adversarial example is given by:


2. Iterative FGSM ((I-FGSM; Kurakin et al. (2016b)):iteratively applies the FGSM update, where M is the number of iterations. IFGSM.PNG

3. DeepFool ((Moosavi-Dezfooliet al., 2016) projects x onto a linearization of the decision boundary defined by h(.) for M iterations DeepFool.PNG

4. Carlini-Wagner's L2 attack:(CW-L2; Carlini & Wagner (2017)) is an optimization-based attack that combines a differentiable surrogate for the model’s classification accuracy with an L2-penalty term.Let Z(x) be the operation that computes the logit vector (i.e., the output before the softmax layer) for an input x, and Z(x)k be the logit value corresponding to class k. The untargeted variant of CW-L2 finds a solution to the unconstrained optimization problem: Carlini.PNG

Below figure shows adversarial images and corresponding perturbations at five levels of normalized L2-dissimilarity for all four attacks. Strength.PNG


Five image transformations that alter the structure of these perturbations have been studied: 1. Image Cropping and Rescaling 2.Bit Depth Reduction 3. JPEG Compression 4. Total Variance Minimization 5.Image Quilting

Image Cropping- Rescaling(Graese et al.,2016) , Bit Depth Reduction( Xu et. al), JPEG Compression and Decompression (Dziugaite etal.,2016):

Image cropping - rescaling has the effect of altering the spatial positioning of the adversarial perturbation, which has the effect of altering the spatial positioning of the adversarial perturbation. In this study images are cropped and rescaled during training time.

Bit Depth Reduction( Xu et. al) , performs a simple type of quantization that can remove small (adversarial) variations in pixel values from an image. Images are reduced to 3 bits in the experiment.

JPEG Compression and Decompression (Dziugaite etal.,2016) , removes small pertubations by performing simple quantizations.

Total Variance Minimization :

Image Quilting(Efros & Freeman, 2001) Image Quilting is a non-parametric technique that synthesizes images by piecing together small patches that are taken from a database of image patches. The algorithm places appropriate patches in the database for a predefined set of grid points , and computes minimum graph cuts in all overlapping boundary regions to remove edge artifacts. Image Quilting can be used to remove adversarial perturbations by constructing a patch database that only contain patches from "clean" images ( without adversarial perturbations); the patches used to create the synthesized image are selected by finding the K nearest neighbors ( in pixel space) of the corresponding patch from the adversarial image in the patch database, and picking one of these neighbors uniformly at random. The motivation for this defense is that resulting image only contains of pixels that were not modified by the adversary - the database of real patches is unlikely to contain the structures that appear in adversarial images.


Five experiments were performed to test the efficacy of defenses. Set up: Experiments are performed on the ImageNet image classification dataset. The dataset comprises 1.2 million training images and 50,000 test images that correspond to one of 1000 classes. The adversarial images are produced by attacking a ResNet-50 model. The strength of an adversary is measured in terms of its normalized L2-dissimilarity. To produce the adversarial images, L2 dissimilarity for each of the attack was set as below:

- FGSM. Increasing the step size [math]\epsilon[/math], increases the normalized L2-dissimilarity.

- I-FGSM. We fix M=10, and increase [math]\epsilon[/math] to increase the normalized L2-dissimilarity.

- DeepFool. We fix M=5, and increase [math]\epsilon[/math] to increase the normalized L2-dissimilarity.

- CW-L2. We fix [math]k[/math]=0 and [math]\lambda_{f}[/math] =10, and multiply the resulting perturbation

The hyper parameters of the defenses have been fixed in all the the experiments. Specifically the pixel dropout probability was set to [math]p[/math]=0.5 and regularization parameter of total variation minimizer [math]\lambda_{TV}[/math]=0.03

1. GRAY BOX - Image Transformation at Training and Test Time: This experiment applies transformation on adversarial images at test time before feeding them to a ResNet -50 which was trained to classify clean images. Figure , shows the results for five different transformations applied and their corresponding Top-1 accuracy.Few of the interesting observations from the plot are: Crop ensemble gives the best accuracy around 40-60 percent. Accuracy of Image Quilting Defense hardly deteriorates as the strength of the adversary increases. sFig4.png

2. BlackBox - Image Transformation at Training and Test Time: ResNet-50 model was trained on ImageNet Training images. Before feeding the images to the network for training, standard data augmentation (from He et al) along with bit depth reduction, JPEG Compression, TV Minimzation, or Image Quilting were applied on the images. The classification accuracy on adversarial images is shown in the Figure below. sFig5.png

3. Blackbox - Ensembling : Four networks ResNet-50, resNet-10, DenseNet-169, and Inception-v4 , were studied , along with ensemble of defenses as shown in the table below. The adversarial images are produced by attacking a ResNet-50 model. The results in the table conclude that best ensemble of defenses give an accuracy of about 71% against all the attacks. sTab1.png

4. GrayBox - Image Transformation at Training and Test Time : In this experiment, the adversary has access to network and the related parameters ( but doesnot have access to the input transformations applied at test time). From the network trained in Point 2.(BlackBox : Image Transformation at Training and Test Time), novel adversarial images were generated by the four attack methods The results for this experiment are shown in below figure. Networks using these defenses classify upto 50 % of images correctly. sFig6.png

5. Comparison With Prior Work: The results of the experiment are compared with the state of the art ensemble adversarial training approach proposed by Tramer et al. 2 2017. Below table provides the results of sTab2.png


Results from the study suggest , there exists a range of image transformations that have the potential to remove adversarial perturbations while preserving the visual content of the image: one merely has to train the convolutional network on images that were transformed in the same way. A strong input defense should , be non differentiable and randomized , a strategy previously shown to work effectively by (Wang et al). Two of the defenses show


1. Terminology of Black Box, White Box and Grey Box attack is not exactly given and clear. 2. White Box attacks could have been considered where the adversary has a full access to the model as well as the pre-processing


3. Though the authors did a considerable work in showing the effect of four attacks on ImageNet database, much stronger attacks (Madry

  et al) [9], could have been evaluated.

4. Authors claim that the success rate is generally measured as a function of the magnitude of perturbations , performed by the attack

  using the L2- dissimilarity,  but claim is not supported by any references. 


1. Chuan Guo , Mayank Rana & Moustapha Ciss´e & Laurens van der Maaten , Countering Adversarial Images Using Input Transformations