Countering Adversarial Images Using Input Transformations

From statwiki
Revision as of 23:37, 14 November 2018 by Skoundin (talk | contribs) (Discussion)
Jump to: navigation, search


As the use of machine intelligence has increased , robustness has become a critical feature to guarantee the reliability of deployed machine-learning systems. However, recent research has shown that existing models are not robust to small , adversarial designed perturbations of the input. Adversarial examples are inputs to Machine Learning models that an attacker has intentionally designed to cause the model to make a mistake.The adversarial examples are not specific to Images , but also Malware, Text Understanding ,Speech. Below example (Goodfellow et. al), a small perturbation when applied to original image of panda, the prediction is changed to gibbon.


Hence an urgent need for approaches/defenses that increase the robustness of learning systems to such adversarial examples.


The paper studies strategies that defend against adversarial-example attacks on image-classification systems by transforming the images before feeding them to a Convolutional Network Classifier. Generally, defenses against adversarial examples fall into two main categories -

1. Model-Specific – They enforce model properties such as smoothness and in-variance via the learning algorithm.

2. Model-Agnostic – They try to remove adversarial perturbations from the input.

This paper focuses on increasing the effectiveness of Model Agnostic defense strategies. Below image transformations techniques have been studied:

1. Image Cropping and Re-scaling ( Graese et al, 2016).

2. Bit Depth Reduction (Xu et. al, 2017)

3. JPEG Compression (Dziugaite et al, 2016)

4. Total Variance Minimization(RUdin at al , 1992)

5. Image Quilting (Efros & Freeman , 2001).

These image transformations have been studied against Adversarial attacks such as - fast gradient sign method(Kurakin et al., 2016a), Deepfool (Moosavi-Dezfooli et al., 2016), and the Carlini & Wagner (2017) attack. From the experiments the strongest defenses are based on Total Variance Minimization and Image Quilting: as these defenses are non-differentiable and inherently random which makes difficult for an adversary to get around them.

Problem Definition/Terminology

Gray Box Attack : Model Architecture and parameters are Public

Black Box Attack : Adversary does not have access to the model.

Non Targeted Adversarial Attack : Goal of the attack is to modify source image in a way that image will be classified incorrectly by Machine Learning Classifier

Targeted Adversarial Attack : Goal of the attack is to modify source image in way that image will be classified as a specific target by Machine Learning Classifier.

The paper discusses non- targeted adversarial example for image recognition systems. Given image space X , and a classifier h(.) , and a source image x ∈ X , a non targeted adversarial example of x is a perturbed image x'∈ X , such that h(x) ≠ h(x'). Given a set of N images {x1, …xn} , a target classifier h(.) , an adversarial attack aims to generate { x{1},…..x'{n}}, such that (x'n)is an adversary of xn.

Success rate of an attack is given as:


which is the proportions of predictions that were altered by an attack.

Success Rate is generally measured as a function of the magnitude of perturbations performed by the attack , using normalized L2-dissimilarity. : diss.png

A strong adversarial attack has a high rate , while its normalized L2-dissimilarity given by the above equation is less.

Adversarial Attacks

For the experimental purposes, below 4 attacks have been studied.

1. Fast Gradient Sign Method (FGSM; Goodfellow et al. (2015)): Given a source input x, and true label y, and let l be the differentiable loss function used to train the classifier h(.). Then the corresponding adversarial example is given by:


2. Iterative FGSM ((I-FGSM; Kurakin et al. (2016b)):iteratively applies the FGSM update, where M is the number of iterations. IFGSM.PNG

3. DeepFool ((Moosavi-Dezfooliet al., 2016) projects x onto a linearization of the decision boundary defined by h(.) for M iterations DeepFool.PNG

4. Carlini-Wagner's L2 attack:(CW-L2; Carlini & Wagner (2017)) is an optimization-based attack that combines a differentiable surrogate for the model’s classification accuracy with an L2-penalty term.Let Z(x) be the operation that computes the logit vector (i.e., the output before the softmax layer) for an input x, and Z(x)k be the logit value corresponding to class k. The untargeted variant of CW-L2 finds a solution to the unconstrained optimization problem: Carlini.PNG

Below figure shows adversarial images and corresponding perturbations at five levels of normalized L2-dissimilarity for all four attacks. Strength.PNG


Five image transformations that alter the structure of these perturbations have been studied: 1. Image Cropping and Rescaling 2.Bit Depth Reduction 3. JPEG Compression 4. Total Variance Minimization 5.Image Quilting

Image Cropping- Rescaling(Graese et al.,2016) , Bit Depth Reduction( Xu et. al), JPEG Compression and Decompression (Dziugaite etal.,2016):

Image cropping - rescaling has the effect of altering the spatial positioning of the adversarial perturbation, which has the effect of altering the spatial positioning of the adversarial perturbation. In this study images are cropped and rescaled during training time.

Bit Depth Reduction( Xu et. al) , performs a simple type of quantization that can remove small (adversarial) variations in pixel values from an image. Images are reduced to 3 bits in the experiment.

JPEG Compression and Decompression (Dziugaite etal.,2016) , removes small pertubations by performing simple quantizations.

Total Variance Minimization :

Image Quilting(Efros & Freeman, 2001) Image Quilting is a non-parametric technique that synthesizes images by piecing together small patches that are taken from a database of image patches. The algorithm places appropriate patches in the database for a predefined set of grid points , and computes minimum graph cuts in all overlapping boundary regions to remove edge artifacts. Image Quilting can be used to remove adversarial perturbations by constructing a patch database that only contain patches from "clean" images ( without adversarial perturbations); the patches used to create the synthesized image are selected by finding the K nearest neighbors ( in pixel space) of the corresponding patch from the adversarial image in the patch database, and picking one of these neighbors uniformly at random. The motivation for this defense is that resulting image only contains of pixels that were not modified by the adversary - the database of real patches is unlikely to contain the structures that appear in adversarial images.


Set up:

1. Gray Box : Image Transformation at Test Time:

  This experiment applies transformation on adversarial images at test time before feeding them to a ResNet -50 which was trained to 
  classify clean images. Figure  , shows the results for five different transformations applied and their corresponding Top-1 
  accuracy.Few of the interesting observations from the plot are: Crop ensemble gives the best accuracy around 40-60 percent. Accuracy of 
  Image Quilting Defense hardly deteriorates as the strength of the adversary increases.

2. BlackBox : Image Transformation at Training and Test Time:

  ResNet-50 model was trained on ImageNet Training images. Before feeding the images to the network for training, standard data 
  augmentation along with bit depth reduction, JPEG Compression, TV Minimzation, or Image Quilting were applied on the images. The 
  classification accuracy on adversarial images is shown in the Figure.

3. Blackbox : Ensembling :

  Four networks ResNet-50, resNet-10, DenseNet-169, and Inception-v4 , were studied , along with ensemble of defenses. The results in the 
  table conclude that best ensemble of defenses give an accuracy of about 71% against all the attacks.

4. GrayBox : Image Transformation at Training and Test Time

5. Comparison With Prior Work:

  Defenses were compared with the state of the art ensemble adversarial training approach proposed by Tramer et al. 2017. Below table 
  provides the results of 


Results from the study suggest , there exists a range of image transformations that have the potential to remove adversarial perturbations while preserving the visual content of the image: one merely has to train the convolutional network on images that were transformed in the same way.