# Countering Adversarial Images Using Input Transformations

## Motivation

As the use of machine intelligence has increased , robustness has become a critical feature to guarantee the reliability of deployed machine-learning systems. However, recent research has shown that existing models are not robust to small , adversarial designed perturbations of the input. Adversarial examples are inputs to Machine Learning models that an attacker has intentionally designed to cause the model to make a mistake.Adversarially perturbed examples have been deployed to attack image classification services (Liu et al., 2016)[11], speech recognition systems (Cisse et al., 2017a)[12], and robot vision (Melis et al., 2017)[13]. The existence of these adversarial examples has motivated proposals for approaches that increase the robustness of learning systems to such examples Below example (Goodfellow et. al) [17], a small perturbation when applied to original image of panda, the prediction is changed to gibbon.

## Introduction

The paper studies strategies that defend against adversarial-example attacks on image-classification systems by transforming the images before feeding them to a Convolutional Network Classifier. Generally, defenses against adversarial examples fall into two main categories -

1. Model-Specific – They enforce model properties such as smoothness and in-variance via the learning algorithm.

2. Model-Agnostic – They try to remove adversarial perturbations from the input.

This paper focuses on increasing the effectiveness of Model Agnostic defense strategies. Below image transformations techniques have been studied:

1. Image Cropping and Re-scaling ( Graese et al, 2016).

2. Bit Depth Reduction (Xu et. al, 2017)

3. JPEG Compression (Dziugaite et al, 2016)

4. Total Variance Minimization(RUdin at al , 1992)

5. Image Quilting (Efros & Freeman , 2001).

These image transformations have been studied against Adversarial attacks such as - fast gradient sign method(Kurakin et al., 2016a), Deepfool (Moosavi-Dezfooli et al., 2016), and the Carlini & Wagner (2017) attack. From the experiments the strongest defenses are based on Total Variance Minimization and Image Quilting: as these defenses are non-differentiable and inherently random which makes difficult for an adversary to get around them.

## Previous Work

Recently lot of research has gone to encounter adversarial threats. Wang et al [4] , proposed a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within samples. Tramer et al [2] , showed the state of the art Ensemble Adversarial Training Method, which augments the training data with perturbations transferred from other models. Their method Inception ResNet v2, finished 1st among 70 submissions of NIPS 2017 competition on Defenses against Adversarial Attacks . Graese,et al. [3], showed how input transformation such as shifting, blurring and noise can render majority of the adversarial examples as nonadversarial. Xu et al.[5] demonstrated , how feature squeezing methods such as reducing the color bit depth of each pixel and spatial smoothing, defends against state of the art attacks. Dziugaite et al [6], studied the effect of JPG compression on adversarial images.

## Problem Definition/Terminology

Gray Box Attack : Model Architecture and parameters are Public

Non Targeted Adversarial Attack : Goal of the attack is to modify source image in a way that image will be classified incorrectly by Machine Learning Classifier

Targeted Adversarial Attack : Goal of the attack is to modify source image in way that image will be classified as a specific target by Machine Learning Classifier.

The paper discusses non- targeted adversarial example for image recognition systems. Given image space X , and a classifier h(.) , and a source image x ∈ X , a non targeted adversarial example of x is a perturbed image x'∈ X , such that h(x) ≠ h(x'). Given a set of N images {x1, …xn} , a target classifier h(.) , an adversarial attack aims to generate { x{1},…..x'{n}}, such that (x'n)is an adversary of xn.

Success rate of an attack is given as:

,

which is the proportions of predictions that were altered by an attack.

Success Rate is generally measured as a function of the magnitude of perturbations performed by the attack , using normalized L2-dissimilarity. :

A strong adversarial attack has a high rate , while its normalized L2-dissimilarity given by the above equation is less.

Defense : A defense is a strategy that aims make the prediction on an adversarial example h(x') equal to the prediction on the corresponding clean example h(x).

For the experimental purposes, below 4 attacks have been studied.

1. Fast Gradient Sign Method (FGSM; Goodfellow et al. (2015)) [17]: Given a source input x, and true label y, and let l be the differentiable loss function used to train the classifier h(.). Then the corresponding adversarial example is given by:

2. Iterative FGSM ((I-FGSM; Kurakin et al. (2016b))[14] :iteratively applies the FGSM update, where M is the number of iterations.It is given as:

3. DeepFool ((Moosavi-Dezfooliet al., 2016) [15] projects x onto a linearization of the decision boundary defined by h(.) for M iterations.It is given as:

4. Carlini-Wagner's L2 attack:(CW-L2; Carlini & Wagner (2017)) [16] is an optimization-based attack that combines a differentiable surrogate for the model’s classification accuracy with an L2-penalty term.Let Z(x) be the operation that computes the logit vector (i.e., the output before the softmax layer) for an input x, and Z(x)k be the logit value corresponding to class k. The untargeted variant of CW-L2 finds a solution to the unconstrained optimization problem. It is given as:

Below figure shows adversarial images and corresponding perturbations at five levels of normalized L2-dissimilarity for all four attacks, mentioned above.

## Defenses

Defense is a strategy that aims make the prediction on an adversarial example equal to the prediction on the corresponding clean example Five image transformations that alter the structure of these perturbations have been studied: 1. Image Cropping and Rescaling 2.Bit Depth Reduction 3. JPEG Compression 4. Total Variance Minimization 5.Image Quilting

Image Cropping- Rescaling(Graese et al.,2016)[3] , Bit Depth Reduction( Xu et. al)[5], JPEG Compression and Decompression (Dziugaite etal.,2016)[6]:

Image cropping - rescaling has the effect of altering the spatial positioning of the adversarial perturbation, which has the effect of altering the spatial positioning of the adversarial perturbation. In this study images are cropped and rescaled during training time.

Bit Depth Reduction( Xu et. al) , performs a simple type of quantization that can remove small (adversarial) variations in pixel values from an image. Images are reduced to 3 bits in the experiment.

JPEG Compression and Decompression (Dziugaite etal.,2016) , removes small pertubations by performing simple quantizations.

Total Variance Minimization [9] : This combines pixel dropout with total variance minimization. This approach randomly selects a small set of pixels, and reconstructs the “simplest” image that is consistent with the selected pixels. The reconstructed image does not contain the adversarial perturbations because these perturbations tend to be small and localized.Specifically, we first select a random set of pixels by sampling a Bernoulli random variable $X(i; j; k)$ for each pixel location $(i; j; k)$;we maintain a pixel when $(i; j; k)$= 1. Next, we use total variation, minimization to constructs an image z that is similar to the (perturbed) input image x for the selected set of pixels, whilst also being “simple” in terms of total variation by solving:

,

where $TV_{p}(z)$ represents $L_{p}$ total variation of z :

The total variation (TV) measures the amount of fine-scale variation in the image z, as a result of which TV minimization encourages removal of small (adversarial) perturbations in the image.

Image Quilting(Efros & Freeman, 2001)[8] Image Quilting is a non-parametric technique that synthesizes images by piecing together small patches that are taken from a database of image patches. The algorithm places appropriate patches in the database for a predefined set of grid points , and computes minimum graph cuts in all overlapping boundary regions to remove edge artifacts. Image Quilting can be used to remove adversarial perturbations by constructing a patch database that only contain patches from "clean" images ( without adversarial perturbations); the patches used to create the synthesized image are selected by finding the K nearest neighbors ( in pixel space) of the corresponding patch from the adversarial image in the patch database, and picking one of these neighbors uniformly at random. The motivation for this defense is that resulting image only contains of pixels that were not modified by the adversary - the database of real patches is unlikely to contain the structures that appear in adversarial images.

# Experiments

Five experiments were performed to test the efficacy of defenses.

Set up: Experiments are performed on the ImageNet image classification dataset. The dataset comprises 1.2 million training images and 50,000 test images that correspond to one of 1000 classes. The adversarial images are produced by attacking a ResNet-50 model, with different kinds of attacks mentioned in Section5. The strength of an adversary is measured in terms of its normalized L2-dissimilarity. To produce the adversarial images, L2 dissimilarity for each of the attack was set as below:

- FGSM. Increasing the step size $\epsilon$, increases the normalized L2-dissimilarity.

- I-FGSM. We fix M=10, and increase $\epsilon$ to increase the normalized L2-dissimilarity.

- DeepFool. We fix M=5, and increase $\epsilon$ to increase the normalized L2-dissimilarity.

- CW-L2. We fix $k$=0 and $\lambda_{f}$ =10, and multiply the resulting perturbation

The hyper parameters of the defenses have been fixed in all the the experiments. Specifically the pixel dropout probability was set to $p$=0.5 and regularization parameter of total variation minimizer $\lambda_{TV}$=0.03.

Below figure shows the difference between the set up in different experiments below. The network is either trained on a) regular images or b) transformed images. The different settings are marked by 7.1, 7.2 and 7.3

## GrayBox- Image Transformation at Test Time

This experiment applies transformation on adversarial images at test time before feeding them to a ResNet -50 which was trained to classify clean images. Below figure shows the results for five different transformations applied and their corresponding Top-1 accuracy.Few of the interesting observations from the plot are:All of the image transformations partly eliminate the effects of the attack, Crop ensemble gives the best accuracy around 40-60 percent. Accuracy of Image Quilting Defense hardly deteriorates as the strength of the adversary increases.

## BlackBox - Image Transformation at Training and Test Time

ResNet-50 model was trained on transformed ImageNet Training images. Before feeding the images to the network for training, standard data augmentation (from He et al) along with bit depth reduction, JPEG Compression, TV Minimzation, or Image Quilting were applied on the images. The classification accuracy on the same adversarial images as in previous case is shown Figure below. (Adversary cannot get this trained model to generate new images - Hence this is assumed as a Black Box setting!). Below figure concludes that training Convolutional Neural Networks on images that are transformed in the same way at test time, dramatically improves the effectiveness of all transformation defenses. Nearly 80 -90 % of the attacks are defended successfully, even when the L2- dissimilarity is high.

## Blackbox - Ensembling

Four networks ResNet-50, ResNet-10, DenseNet-169, and Inception-v4 , were studied , along with ensemble of defenses as shown in the table below. The adversarial images are produced by attacking a ResNet-50 model. The results in the table conclude that best ensemble of defenses give an accuracy of about 71% against all the attacks. The attacks deteriorate, the accuracy of the best defenses by atmost 6%.

## GrayBox - Image Transformation at Training and Test Time

In this experiment, the adversary has access to network and the related parameters ( but doesnot have access to the input transformations applied at test time). From the network trained in-(BlackBox : Image Transformation at Training and Test Time), novel adversarial images were generated by the four attack methods. The results show that Bit-Depth Reduction and JPEG Compression are weak defenses in such a gray box setting. The results for this experiment are shown in below figure. Networks using these defenses classify upto 50 % of images correctly.

## Comparison With Ensemble Adversarial Training

The results of the experiment are compared with the state of the art ensemble adversarial training approach proposed by Tramer et al. 2 2017. Ensemble Training fits the parameters of a Convolutional Neural Network on adversarial examples that were generated to attack an ensemble of pre-trained models. The model release by Tramer et al [2]: an Inception-Resnet-v2 , trained on adversarial examples generated by FGSM against Inception-Resnet-v2 and Inception-v3 models. Below table provides the results of Ensemble Training and the preprocessing techniques mentioned in this paper. The results show that ensemble adversarial training works better on FGSM attacks (which it uses at training time), but is outperformed by each of the transformation-based defenses all other attacks

# Discussion/Conclusions

The paper proposed reasonable approaches to countering adversarial images. The authors evaluated Total Variance Minimization and Image Quilting , and compared it with already proposed ideas like Image Cropping- Rescaling, Bit Depth Reduction , JPEG Compression and Decompression on the challenging ImageNet dataset. Previous work by (Wang et al) [10] , shows that a strong input defense should , be non differentiable and randomized. Two of the defenses - namely Total Variation Minimization and Image Quilting, both possess this property. Future work suggests applying same techniques to other domains such as speech recognition and image segmentation. The input transformations can also be studied with ensemble adversarial training by Tramèr et al.[2]

# Critiques

1. Terminology of Black Box, White Box and Grey Box attack is not exactly given and clear.

2. White Box attacks could have been considered where the adversary has a full access to the model as well as the pre-processing techniques.

3. Though the authors did a considerable work in showing the effect of four attacks on ImageNet database, much stronger attacks (Madry et al) [7], could have been evaluated.

4. Authors claim that the success rate is generally measured as a function of the magnitude of perturbations , performed by the attack using the L2- dissimilarity, but claim is not supported by any references. None of the previous work has used this metrics

# References

1. Chuan Guo , Mayank Rana & Moustapha Ciss´e & Laurens van der Maaten , Countering Adversarial Images Using Input Transformations

2. Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel, Ensemble Adversarial Training: Attacks and defenses.

3. Abigail Graese, Andras Rozsa, and Terrance E. Boult. Assessing threat of adversarial examples of deep neural networks. CoRR, abs/1610.04256, 2016.

4. Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, C. Lee Giles, and Xue Liu. Adversary resistant deep neural networks with an application to malware detection. CoRR, abs/1610.01239, 2016a.

5.Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017.

6. Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel Roy. A study of the effect of JPG compression on adversarial images. CoRR, abs/1608.00853, 2016.

8.Alexei Efros and William Freeman. Image quilting for texture synthesis and transfer. In Proc. SIGGRAPH, pp. 341–346, 2001.

9.Leonid Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992.

10.Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, C. Lee Giles, and Xue Liu. Learning adversary-resistant deep neural networks. CoRR, abs/1612.01401, 2016b.

11. Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. CoRR, abs/1611.02770, 2016.

12. Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. Houdini: Fooling deep structured prediction models. CoRR, abs/1707.05373, 2017

13. Marco Melis, Ambra Demontis, Battista Biggio, Gavin Brown, Giorgio Fumera, and Fabio Roli. Is deep learning safe for robot vision? adversarial examples against the icub humanoid. CoRR,abs/1708.06939, 2017.

14. Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016b.

15. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In Proc. CVPR, pp. 2574–2582, 2016.

16. Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, pp. 39–57, 2017.

17. Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proc. ICLR, 2015.