# Countering Adversarial Images Using Input Transformations

## Motivation

As the use of machine intelligence has increased , robustness has become a critical feature to guarantee the reliability of deployed machine-learning systems. However, recent research has shown that existing models are not robust to small , adversarial designed perturbations of the input. Adversarial examples are inputs to Machine Learning models that an attacker has intentionally designed to cause the model to make a mistake.Adversarially perturbed examples have been deployed to attack image classification services (Liu et al., 2016), speech recognition systems (Cisse et al., 2017a), and robot vision (Melis et al., 2017). The existence of these adversarial examples has motivated proposals for approaches that increase the robustness of learning systems to such examples Below example (Goodfellow et. al), a small perturbation when applied to original image of panda, the prediction is changed to gibbon.

## Introduction

The paper studies strategies that defend against adversarial-example attacks on image-classification systems by transforming the images before feeding them to a Convolutional Network Classifier. Generally, defenses against adversarial examples fall into two main categories -

1. Model-Specific – They enforce model properties such as smoothness and in-variance via the learning algorithm.

2. Model-Agnostic – They try to remove adversarial perturbations from the input.

This paper focuses on increasing the effectiveness of Model Agnostic defense strategies. Below image transformations techniques have been studied:

1. Image Cropping and Re-scaling ( Graese et al, 2016).

2. Bit Depth Reduction (Xu et. al, 2017)

3. JPEG Compression (Dziugaite et al, 2016)

4. Total Variance Minimization(RUdin at al , 1992)

5. Image Quilting (Efros & Freeman , 2001).

These image transformations have been studied against Adversarial attacks such as - fast gradient sign method(Kurakin et al., 2016a), Deepfool (Moosavi-Dezfooli et al., 2016), and the Carlini & Wagner (2017) attack. From the experiments the strongest defenses are based on Total Variance Minimization and Image Quilting: as these defenses are non-differentiable and inherently random which makes difficult for an adversary to get around them.

## Previous Work

Recently lot of research has gone to encounter adversarial threats. Wang et al [4] , proposed a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within samples. Tramer et al [2] , showed the state of the art Ensemble Adversarial Training Method, which augments the training data with perturbations transferred from other models. Their method Inception ResNet v2, finished 1st among 70 submissions in the first development round. Abigail et al. [3], showed how input transformation such as shifting, blurring and noise can render majority of the adversarial examples as nonadversarial. Xu et al.[5] demonstrated , how feature squeezing methods such as reducing the color bit depth of each pixel and spatial smoothing, defends against state of the art attacks. Dziugaite et al [7], studied the effect of JPG compression on adversarial images.

## Problem Definition/Terminology

Gray Box Attack : Model Architecture and parameters are Public

Non Targeted Adversarial Attack : Goal of the attack is to modify source image in a way that image will be classified incorrectly by Machine Learning Classifier

Targeted Adversarial Attack : Goal of the attack is to modify source image in way that image will be classified as a specific target by Machine Learning Classifier.

The paper discusses non- targeted adversarial example for image recognition systems. Given image space X , and a classifier h(.) , and a source image x ∈ X , a non targeted adversarial example of x is a perturbed image x'∈ X , such that h(x) ≠ h(x'). Given a set of N images {x1, …xn} , a target classifier h(.) , an adversarial attack aims to generate { x{1},…..x'{n}}, such that (x'n)is an adversary of xn.

Success rate of an attack is given as:

,

which is the proportions of predictions that were altered by an attack.

Success Rate is generally measured as a function of the magnitude of perturbations performed by the attack , using normalized L2-dissimilarity. :

A strong adversarial attack has a high rate , while its normalized L2-dissimilarity given by the above equation is less.

Defense : A defense is a strategy that aims make the prediction on an adversarial example h(x') equal to the prediction on the corresponding clean example h(x).

For the experimental purposes, below 4 attacks have been studied.

1. Fast Gradient Sign Method (FGSM; Goodfellow et al. (2015)): Given a source input x, and true label y, and let l be the differentiable loss function used to train the classifier h(.). Then the corresponding adversarial example is given by:

2. Iterative FGSM ((I-FGSM; Kurakin et al. (2016b)):iteratively applies the FGSM update, where M is the number of iterations.

3. DeepFool ((Moosavi-Dezfooliet al., 2016) projects x onto a linearization of the decision boundary defined by h(.) for M iterations

4. Carlini-Wagner's L2 attack:(CW-L2; Carlini & Wagner (2017)) is an optimization-based attack that combines a differentiable surrogate for the model’s classification accuracy with an L2-penalty term.Let Z(x) be the operation that computes the logit vector (i.e., the output before the softmax layer) for an input x, and Z(x)k be the logit value corresponding to class k. The untargeted variant of CW-L2 finds a solution to the unconstrained optimization problem:

Below figure shows adversarial images and corresponding perturbations at five levels of normalized L2-dissimilarity for all four attacks.

## Defenses

Five image transformations that alter the structure of these perturbations have been studied: 1. Image Cropping and Rescaling 2.Bit Depth Reduction 3. JPEG Compression 4. Total Variance Minimization 5.Image Quilting

Image Cropping- Rescaling(Graese et al.,2016) , Bit Depth Reduction( Xu et. al), JPEG Compression and Decompression (Dziugaite etal.,2016):

Image cropping - rescaling has the effect of altering the spatial positioning of the adversarial perturbation, which has the effect of altering the spatial positioning of the adversarial perturbation. In this study images are cropped and rescaled during training time.

Bit Depth Reduction( Xu et. al) , performs a simple type of quantization that can remove small (adversarial) variations in pixel values from an image. Images are reduced to 3 bits in the experiment.

JPEG Compression and Decompression (Dziugaite etal.,2016) , removes small pertubations by performing simple quantizations.

Total Variance Minimization :

Image Quilting(Efros & Freeman, 2001) Image Quilting is a non-parametric technique that synthesizes images by piecing together small patches that are taken from a database of image patches. The algorithm places appropriate patches in the database for a predefined set of grid points , and computes minimum graph cuts in all overlapping boundary regions to remove edge artifacts. Image Quilting can be used to remove adversarial perturbations by constructing a patch database that only contain patches from "clean" images ( without adversarial perturbations); the patches used to create the synthesized image are selected by finding the K nearest neighbors ( in pixel space) of the corresponding patch from the adversarial image in the patch database, and picking one of these neighbors uniformly at random. The motivation for this defense is that resulting image only contains of pixels that were not modified by the adversary - the database of real patches is unlikely to contain the structures that appear in adversarial images.

# Experiments

Five experiments were performed to test the efficacy of defenses.

Set up: Experiments are performed on the ImageNet image classification dataset. The dataset comprises 1.2 million training images and 50,000 test images that correspond to one of 1000 classes. The adversarial images are produced by attacking a ResNet-50 model. The strength of an adversary is measured in terms of its normalized L2-dissimilarity. To produce the adversarial images, L2 dissimilarity for each of the attack was set as below:

- FGSM. Increasing the step size $\epsilon$, increases the normalized L2-dissimilarity.

- I-FGSM. We fix M=10, and increase $\epsilon$ to increase the normalized L2-dissimilarity.

- DeepFool. We fix M=5, and increase $\epsilon$ to increase the normalized L2-dissimilarity.

- CW-L2. We fix $k$=0 and $\lambda_{f}$ =10, and multiply the resulting perturbation

The hyper parameters of the defenses have been fixed in all the the experiments. Specifically the pixel dropout probability was set to $p$=0.5 and regularization parameter of total variation minimizer $\lambda_{TV}$=0.03

## GrayBox- Image Transformation at Training and Test Time

This experiment applies transformation on adversarial images at test time before feeding them to a ResNet -50 which was trained to classify clean images. Figure , shows the results for five different transformations applied and their corresponding Top-1 accuracy.Few of the interesting observations from the plot are: Crop ensemble gives the best accuracy around 40-60 percent. Accuracy of Image Quilting Defense hardly deteriorates as the strength of the adversary increases.

## BlackBox - Image Transformation at Training and Test Time

ResNet-50 model was trained on ImageNet Training images. Before feeding the images to the network for training, standard data augmentation (from He et al) along with bit depth reduction, JPEG Compression, TV Minimzation, or Image Quilting were applied on the images. The classification accuracy on adversarial images is shown in the Figure below.


## Blackbox - Ensembling

Four networks ResNet-50, resNet-10, DenseNet-169, and Inception-v4 , were studied , along with ensemble of defenses as shown in the table below. The adversarial images are produced by attacking a ResNet-50 model. The results in the table conclude that best ensemble of defenses give an accuracy of about 71% against all the attacks.

## GrayBox - Image Transformation at Training and Test Time

In this experiment, the adversary has access to network and the related parameters ( but doesnot have access to the input transformations applied at test time). From the network trained in Point 2.(BlackBox : Image Transformation at Training and Test Time), novel adversarial images were generated by the four attack methods The results for this experiment are shown in below figure. Networks using these defenses classify upto 50 % of images correctly.

5. Comparison With Prior Work: The results of the experiment are compared with the state of the art ensemble adversarial training approach proposed by Tramer et al. 2 2017. Below table provides the results of

# Discussion

Results from the study suggest , there exists a range of image transformations that have the potential to remove adversarial perturbations while preserving the visual content of the image: one merely has to train the convolutional network on images that were transformed in the same way. A strong input defense should , be non differentiable and randomized , a strategy previously shown to work effectively by (Wang et al). Two of the defenses show

# Critiques

1. Terminology of Black Box, White Box and Grey Box attack is not exactly given and clear.

2. White Box attacks could have been considered where the adversary has a full access to the model as well as the pre-processing techniques.

3. Though the authors did a considerable work in showing the effect of four attacks on ImageNet database, much stronger attacks (Madry et al) [9], could have been evaluated.

4. Authors claim that the success rate is generally measured as a function of the magnitude of perturbations , performed by the attack using the L2- dissimilarity, but claim is not supported by any references.

# References

1. Chuan Guo , Mayank Rana & Moustapha Ciss´e & Laurens van der Maaten , Countering Adversarial Images Using Input Transformations

2.