http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=J23ngo&feedformat=atomstatwiki - User contributions [US]2024-03-28T10:06:40ZUser contributionsMediaWiki 1.41.0http://wiki.math.uwaterloo.ca/statwiki/index.php?title=F18-STAT841-Proposal&diff=42195F18-STAT841-Proposal2018-12-01T22:34:35Z<p>J23ngo: </p>
<hr />
<div><br />
'''Use this format (Don’t remove Project 0)'''<br />
<br />
'''Project # 0'''<br />
Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
'''Title:''' Making a String Telephone<br />
<br />
'''Description:''' We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 1'''<br />
Group members:<br />
<br />
Weng, Jiacheng<br />
<br />
Li, Keqi<br />
<br />
Qian, Yi<br />
<br />
Liu, Bomeng<br />
<br />
'''Title:''' RSNA Pneumonia Detection Challenge<br />
<br />
'''Description:''' <br />
<br />
Our team’s project is the RSNA Pneumonia Detection Challenge from Kaggle competition. The primary goal of this project is to develop a machine learning tool to detect patients with pneumonia based on their chest radiographs (CXR). <br />
<br />
Pneumonia is an infection that inflames the air sacs in human lungs which has symptoms such as chest pain, cough, and fever [1]. Pneumonia can be very dangerous especially to infants and elders. In 2015, 920,000 children under the age of 5 died from this disease [2]. Due to its fatality to children, diagnosing pneumonia has a high order. A common method of diagnosing pneumonia is to obtain patients’ chest radiograph (CXR) which is a gray-scale scan image of patients’ chests using x-ray. The infected region due to pneumonia usually shows as an area or areas of increased opacity [3] on CXR. However, many other factors can also contribute to increase in opacity on CXR which makes the diagnose very challenging. The diagnose also requires highly-skilled clinicians and a lot of time of CXR screening. The Radiological Society of North America (RSNA®) sees the opportunity of using machine learning to potentially accelerate the initial CXR screening process. <br />
<br />
For the scope of this project, our team plans to contribute to solving this problem by applying our machine learning knowledge in image processing and classification. Team members are going to apply techniques that include, but are not limited to: logistic regression, random forest, SVM, kNN, CNN, etc., in order to successfully detect CXRs with pneumonia.<br />
<br />
<br />
[1] (Accessed 2018, Oct. 4). Pneumonia [Online]. MAYO CLINIC. Available from: https://www.mayoclinic.org/diseases-conditions/pneumonia/symptoms-causes/syc-20354204<br />
[2] (Accessed 2018, Oct. 4). RSNA Pneumonia Detection Challenge [Online]. Kaggle. Available from: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge<br />
[3] Franquet T. Imaging of community-acquired pneumonia. J Thorac Imaging 2018 (epub ahead of print). PMID 30036297<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 2'''<br />
Group members:<br />
<br />
Hou, Zhaoran<br />
<br />
Zhang, Chi<br />
<br />
'''Title:''' <br />
<br />
'''Description:'''<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 3'''<br />
Group members:<br />
<br />
Hanzhen Yang<br />
<br />
Jing Pu Sun<br />
<br />
Ganyuan Xuan<br />
<br />
Yu Su<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:'''<br />
<br />
Our team chose the [https://www.kaggle.com/c/quickdraw-doodle-recognition Quick, Draw! Doodle Recognition Challenge] from the Kaggle Competition. The goal of the competition is to build an image recognition tool that can classify hand-drawn doodles into one of the 340 categories.<br />
<br />
The main challenge of the project remains in the training set being very noisy. Hand-drawn artwork may deviate substantially from the actual object, and is almost definitively different from person to person. Mislabeled images also present a problem since they will create outlier points when we train our models. <br />
<br />
We plan on learning more about some of the currently mature image recognition algorithms to inspire and develop our own model.<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 4'''<br />
Group members:<br />
<br />
Snaith, Mitchell<br />
<br />
'''Title:''' Reproducibility report: *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks*<br />
<br />
'''Description:''' <br />
<br />
The paper *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks* [1] has been submitted to ICLR 2019. It aims to "fix" variational Bayes and turn it into a robust inference tool through two innovations. <br />
<br />
Goals are to: <br />
<br />
- reproduce the deterministic variational inference scheme as described in the paper without referencing the original author's code, providing a 3rd party implementation<br />
<br />
- reproduce experiment results with own implementation, using the same NN framework for reference implementations of compared methods described in the paper<br />
<br />
- reproduce experiment results with the author's own implementation<br />
<br />
- explore other possible applications of variational Bayes besides heteroscedastic regression<br />
<br />
[1] OpenReview location: https://openreview.net/forum?id=B1l08oAct7<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 5'''<br />
Group members:<br />
<br />
Rebecca, Chen<br />
<br />
Susan,<br />
<br />
Mike, Li<br />
<br />
Ted, Wang<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' <br />
<br />
Classification has become a more and more eye-catching, especially with the rise of machine learning in these years. Our team is particularly interested in machine learning algorithms that optimize some specific type image classification. <br />
<br />
In this project, we will dig into base classifiers we learnt from the class and try to cook them together to find an optimal solution for a certain type images dataset. Currently, we are looking into a dataset from Kaggle: Quick, Draw! Doodle Recognition Challenge. The dataset in this competition contains 50M drawings among 340 categories and is the subset of the world’s largest doodling dataset and the doodling dataset is updating by real drawing game players. Anyone can contribution by joining it! (quickdraw.withgoogle.com).<br />
<br />
For us, as machine learning students, we are more eager to help getting a better classification method. By “better”, we mean find a balance between simplify and accuracy. We will start with neural network via different activation functions in each layer and we will also combine base classifiers with bagging, random forest, boosting for ensemble learning. Also, we will try to regulate our parameters to avoid overfitting in training dataset. Last, we will summary features of this type image dataset, formulate our solutions and standardize our steps to solve this kind problems <br />
<br />
Hopefully, we can not only finish our project successfully, but also make a little contribution to machine learning research field.<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 6'''<br />
Group members:<br />
<br />
Ngo, Jameson<br />
<br />
Xu, Amy<br />
<br />
'''Title:''' Kaggle Challenge: [https://www.kaggle.com/c/PLAsTiCC-2018 PLAsTiCC Astronomical Classification ]<br />
<br />
'''Description:''' <br />
<br />
We will participate in the PLAsTiCC Astronomical Classification competition featured on Kaggle. We will explore how possible it is classify astronomical bodies based on various factors such as brightness.<br />
<br />
These bodies will vary in time and size. Some are unknown! There are over 100 classes that these bodies may be and it will be our job to find the predicted probability for an image to be each class.<br />
<br />
--------------------------------------------------------------------<br />
'''Project # 7'''<br />
Group members:<br />
<br />
Qianying Zhao<br />
<br />
Hui Huang<br />
<br />
Meiyu Zhou<br />
<br />
Gezhou Zhang<br />
<br />
'''Title:''' Quora Insincere Questions Classification<br />
<br />
'''Description:''' <br />
Our group will participate in the featured Kaggle competition of Quora Insincere Questions Classification. For this competition, we should predict wether a question asked on Quora is sincere or not. If the question is insincere, it intends to be a statement rather than look for useful answers, and identified as (target = 1). <br />
We will analyze the Quora question text to predict the characteristics of questions and define they are sincere or insincere using Rstudio. Our presentation report will include not only how we've concluded by classifying and analyzing provided data with appropriate models, but also how we performed in the contest.<br />
<br />
--------------------------------------------------------------------<br />
'''Project # 8'''<br />
Group members:<br />
<br />
Jiayue Zhang<br />
<br />
Lingyun Yi<br />
<br />
Rongrong Su<br />
<br />
Siao Chen<br />
<br />
<br />
'''Title:''' Kaggle--Two Sigma: Using News to Predict Stock Movements<br />
<br />
<br />
'''Description:''' <br />
Stock price is affected by the news to some extent. What is the news influence on stock price and what is the predicted power of the news? <br />
What we are going to do is to use the content of news to predict the tendency of stock price. We will mine the data, finding the useful information behind the big data. As the result we will predict the stock price performance when market faces news.<br />
<br />
<br />
--------------------------------------------------------------------<br />
'''Project # 9'''<br />
Group members:<br />
<br />
Hassan, Ahmad Nayar<br />
<br />
McLellan, Isaac<br />
<br />
Brewster, Kristi<br />
<br />
Melek, Marina Medhat Rassmi <br />
<br />
<br />
'''Title:''' Quick, Draw! Doodle Recognition<br />
<br />
'''Description:''' <br />
<br />
'''Background'''<br />
<br />
Google’s Quick, Draw! is an online game where a user is prompted to draw an image depicting a certain category in under 20 seconds. As the drawing is being completed, the game uses a model which attempts to correctly identify the image being drawn. With the aim to improve the underlying pattern recognition model this game uses, Google is hosting a Kaggle competition asking the public to build a model to correctly identify a given drawing. The model should classify the drawing into one of the 340 label categories within the Quick, Draw! Game in 3 guesses or less.<br />
<br />
'''Proposed Approach'''<br />
<br />
Each image/doodle (input) is considered as a matrix of pixel values. In order to classify images, we need to essentially reshape an images’ respective matrix of pixel values - convolution. This would reduce the dimensionality of the input significantly which in turn reduces the number of parameters of any proposed recognition model. Using filters, pooling layers and further convolution, a final layer called the fully connected layer is used to correlate images with categories, assigning probabilities (weights) and hence classifying images. <br />
<br />
This approach to image classification is called convolution neural network (CNN) and we propose using this to classify the doodles within the Quick, Draw! Dataset.<br />
<br />
To control overfitting and underfitting of our proposed model and minimizing the error, we will use different architectures consisting of different types and dimensions of pooling layers and input filters.<br />
<br />
'''Challenges'''<br />
<br />
This project presents a number of interesting challenges:<br />
* The data given for training is noisy in that it contains drawings that are incomplete or simply poorly drawn. Dealing with this noise will be a significant part of our work. <br />
* There are 340 label categories within the Quick, Draw! dataset, this means that the model created must be able to classify drawings based on a large pool of information while making effective use of powerful computational resources.<br />
<br />
'''Tools & Resources'''<br />
<br />
* We will use Python & MATLAB.<br />
* We will use the Quick, Draw! Dataset available on the Kaggle competition website. <https://www.kaggle.com/c/quickdraw-doodle-recognition/data><br />
<br />
--------------------------------------------------------------------<br />
'''Project # 10'''<br />
Group members:<br />
<br />
Lam, Amanda<br />
<br />
Huang, Xiaoran<br />
<br />
Chu, Qi<br />
<br />
Sang, Di<br />
<br />
'''Title:''' Kaggle Competition: Human Protein Atlas Image Classification<br />
<br />
'''Description:'''<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 11'''<br />
Group members:<br />
<br />
Bobichon, Philomene<br />
<br />
Maheshwari, Aditya<br />
<br />
An, Zepeng<br />
<br />
Stranc, Colin<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' <br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 12'''<br />
Group members:<br />
<br />
Huo, Qingxi<br />
<br />
Yang, Yanmin<br />
<br />
Cai, Yuanjing<br />
<br />
Wang, Jiaqi<br />
<br />
'''Title:''' <br />
<br />
'''Description:''' <br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 13'''<br />
Group members:<br />
<br />
Ross, Brendan<br />
<br />
Barenboim, Jon<br />
<br />
Lin, Junqiao<br />
<br />
Bootsma, James<br />
<br />
'''Title:''' Expanding Neural Netwrok<br />
<br />
'''Description:''' The goal of our project is to create an expanding neural network algorithm which starts off by training a small neural network then expands it to a larger one. We hypothesize that with the proper expansion method we could decrease training time and prevent overfitting. The method we wish to explore is to link together input dimensions based on covariance. Then when the neural network reaches convergence we create a larger neural network without the links between dimensions and using starting values from the smaller neural network. <br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 14'''<br />
Group members:<br />
<br />
Schneider, Jason <br />
<br />
Walton, Jordyn <br />
<br />
Abbas, Zahraa<br />
<br />
Na, Andrew<br />
<br />
'''Title:''' Application of ML Classification to Cancer Identification<br />
<br />
'''Description:''' The application of machine learning to cancer classification based on gene expression is a topic of great interest to physicians and biostatisticians alike. We would like to work on this for our final project to encourage the application of proven ML techniques to improve accuracy of cancer classification and diagnosis. In this project, we will use the dataset from Golub et al. [1] which contains data on gene expression on tumour biopsies to train a model and classify healthy individuals and individuals who have cancer.<br />
<br />
One challenge we may face pertains to the way that the data was collected. Some parts of the dataset have thousands of features (which each represent a quantitative measure of the expression of a certain gene) but as few as twenty samples. We propose some ways to mitigate the impact of this; including the use of PCA, leave-one-out cross validation, or regularization. <br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 15'''<br />
Group members:<br />
<br />
Praneeth, Sai<br />
<br />
Peng, Xudong <br />
<br />
Li, Alice<br />
<br />
Vajargah, Shahrzad<br />
<br />
'''Title:''' Google Analytics Customer Revenue Prediction [1] - A Kaggle Competition<br />
<br />
'''Description:''' Guess which cabin class in airlines is the most profitable? One might guess economy - but in reality, it's the premium classes that show higher returns. According to research conducted by Wendover productions [2], despite having less than 50 seats and taking up more space than the economy class, premium classes end up driving more revenue than other classes.<br />
<br />
In fact, just like airlines, many companies adopt the business model where the vast majority of revenue is derived from a minority group of customers. As a result, data-intensive promotional strategies are getting more and more attention nowadays from marketing teams to further improve company returns.<br />
<br />
In this Kaggle competition, we are challenged to analyze a Google Merchanidize Store's customer dataset to predict revenue per customer. We will implement a series of data analytics methods including pre-processing, data augmentation, and parameter tuning. Different classification algorithms will be compared and optimized in order to achieve the best results.<br />
<br />
'''Reference:'''<br />
<br />
[1] Kaggle. (2018, Sep 18). Google Analytics Customer Revenue Prediction. Retrieved from https://www.kaggle.com/c/ga-customer-revenue-prediction<br />
<br />
[2] Kottke, J (2017, Mar 17). The economics of airline classes. Retrieved from https://kottke.org/17/03/the-economics-of-airline-classes<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 16'''<br />
Group members:<br />
<br />
Wang, Yu Hao<br />
<br />
Grant, Aden <br />
<br />
McMurray, Andrew<br />
<br />
Song, Baizhi<br />
<br />
'''Title:''' Google Analytics Customer Revenue Prediction - A Kaggle Competition<br />
<br />
The 80/20 rule has proven true for many businesses–only a small percentage of customers produce most of the revenue. As such, marketing teams are challenged to make appropriate investments in promotional strategies.<br />
<br />
GStore<br />
<br />
RStudio, the developer of free and open tools for R and enterprise-ready products for teams to scale and share work, has partnered with Google Cloud and Kaggle to demonstrate the business impact that thorough data analysis can have.<br />
<br />
In this competition, you’re challenged to analyze a Google Merchandise Store (also known as GStore, where Google swag is sold) customer dataset to predict revenue per customer. Hopefully, the outcome will be more actionable operational changes and a better use of marketing budgets for those companies who choose to use data analysis on top of GA data.<br />
<br />
we will test a variety of classification algorithms to determine an appropriate model.<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 17'''<br />
Group Members:<br />
<br />
Jiang, Ya Fan<br />
<br />
Zhang, Yuan<br />
<br />
Hu, Jerry Jie<br />
<br />
'''Title:''' Kaggle Competition: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' Construction of a classifier that can learn from noisy training data and generalize to a clean test set . Training data coming from the Google game "Quick, Draw"<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 18'''<br />
Group Members:<br />
<br />
Zhang, Ben<br />
<br />
'''Title:''' Two Sigma: Using News to Predict Stock Movements<br />
<br />
'''Description:''' Use news analytics to predict stock price performance. This is subject to change.<br />
<br />
----------------------------------------------------------------------<br />
'''Project # 19'''<br />
Group Members:<br />
<br />
Yan Yu Chen<br />
<br />
Qisi Deng<br />
<br />
Hengxin Li<br />
<br />
Bochao Zhang<br />
<br />
Our team currently has two interested topics at hand, and we have summarized the objective of each topic below. Please note that we will narrow down our choices after further discussions with the instructor.<br />
<br />
'''Description 1:''' With 14 percent of American claiming that social media is their most dominant news source, fake news shared on Facebook and Twitter are invading people’s information learning experience. Concomitantly, the quality and nature of online news have been gradually diluted by fake news that are sometimes imperceptible. With an aim of creating an unalloyed Internet surfing experience, we sought to develop a tool that performs fake news detection and classification. <br />
<br />
'''Description 2:''' Statistics Canada has recently reported an increasing trend of Toronto’s violent crime score. Though the Royal Canadian Mounted Police has put in the effort and endeavor to track crimes, the ambiguous snapshots captured by outdated cameras often hamper the investigation. Motivated by the aforementioned circumstance, our second interest focuses on the accurate numeral and letter identification within variable-resolution images.<br />
<br />
----------------------------------------------------------------------<br />
'''Project # 20'''<br />
Group Members:<br />
<br />
Dong, Yongqi (Michael)<br />
<br />
Kingston, Stephen<br />
<br />
'''Title:''' Kaggle--Two Sigma: Using News to Predict Stock Movements <br />
<br />
'''Description:''' The movement in price of a trade-able security, or stock, on any given day is an aggregation of each individual market participant’s appraisal of the intrinsic value of the underlying company or assets. These values are primarily driven by investors’ expectations of the company’s ability to generate future free cash flow. A steady stream of information on the state of macro and micro-economic variables which affect a company’s operations inform these market actors, primarily through news articles and alerts. We would like to take a universe of news headlines and parse the information into features, which allow us to classify the direction and ‘intensity’ of a stock’s price move, in any given day. Strategies may include various classification methods to determine the most effective solution.<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 21'''<br />
Group members:<br />
<br />
Xiao, Alex<br />
<br />
Zhang, Richard<br />
<br />
Ash, Hudson<br />
<br />
Zhu, Ziqiu<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge [Subject to Change]<br />
<br />
'''Description:''' <br />
<br />
"Quick, Draw! was released as an experimental game to educate the public in a playful way about how AI works. The game prompts users to draw an image depicting a certain category, such as ”banana,” “table,” etc. The game generated more than 1B drawings, of which a subset was publicly released as the basis for this competition’s training set. That subset contains 50M drawings encompassing 340 label categories."<br />
<br />
Our goal as students are to a build classification tool that will classify hand-drawn doodles into one of the 340 label categories.<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 22'''<br />
Group Members:<br />
<br />
Lee, Yu Xuan<br />
<br />
Heng, Tsen Yee<br />
<br />
'''Title:''' Two Sigma: Using News to Predict Stock Movements<br />
<br />
'''Description:''' Use news analytics to predict stock price performance. This is subject to change.<br />
<br />
<br />
-------------------------------------------------------------------------<br />
<br />
'''Project # 23'''<br />
Group Members:<br />
<br />
Bayati, Mahdiyeh<br />
<br />
Malek Mohammadi, Saber<br />
<br />
Luong, Vincent<br />
<br />
<br />
'''Title:''' Human Protein Atlas Image Classification<br />
<br />
<br />
'''Description:''' The Human Protein Atlas is a Sweden-based initiative aimed at mapping all human proteins in cells, tissues and organs.<br />
<br />
-------------------------------------------------------------------------<br />
<br />
'''Project # 24'''<br />
Group Members:<br />
<br />
Wu Yutong, <br />
<br />
Wang Shuyue,<br />
<br />
Jiao Yan<br />
<br />
'''Title:''' <br />
<br />
'''Description:'''</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41146Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T07:01:32Z<p>J23ngo: Undo revision 41135 by Y89dong (talk)</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
Neural Networks can be made resistant to adversarial attacks. We can design reliable adversarial training methods through a surprisingly regularly structured saddle point optimization problem. Even though the overall problem relates to a maximization of a non-concave function with varying maxima, these maxia are highly concentrated. Truly robust accurate adversarial models are possible after further exploration.<br />
<br />
== References ==<br />
* Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41145Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T07:01:13Z<p>J23ngo: Undo revision 41141 by Y89dong (talk)</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
Neural Networks can be made resistant to adversarial attacks. We can design reliable adversarial training methods through a surprisingly regularly structured saddle point optimization problem. Even though the overall problem relates to a maximization of a non-concave function with varying maxima, these maxia are highly concentrated. Truly robust accurate adversarial models are possible after further exploration.<br />
<br />
== References ==<br />
<ul><br />
<li><sup>[https://openreview.net/forum?id=rJzIBfZAb [1]]</sup> Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.<br />
<li><sup>[https://arxiv.org/pdf/1607.04311.pdf [2]]</sup> Carlini, N., Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311.<br />
<br />
<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41144Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T07:00:07Z<p>J23ngo: Undo revision 41128 by Y89dong (talk)</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
Neural Networks can be made resistant to adversarial attacks. We can design reliable adversarial training methods through a surprisingly regularly structured saddle point optimization problem. Even though the overall problem relates to a maximization of a non-concave function with varying maxima, these maxia are highly concentrated. Truly robust accurate adversarial models are possible after further exploration.<br />
<br />
== References ==<br />
<ul><br />
<li><sup>[https://openreview.net/forum?id=rJzIBfZAb [1]]</sup> Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.<br />
<li><sup>[https://arxiv.org/pdf/1607.04311.pdf [2]]</sup> Carlini, N., Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311.<br />
<li><sup>[https://arxiv.org/pdf/1608.04644.pdf [3]]</sup> Carlini, N., Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 39-57). IEEE.<br />
<li><sup>[https://arxiv.org/pdf/1512.03385.pdf [4]]</sup> He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).<br />
<li><sup>[https://arxiv.org/pdf/1705.07204.pdf [5]]</sup> Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P. (2017). Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204.<br />
<li><sup>[https://arxiv.org/pdf/1412.06572.pdf [6]]</sup> Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.<br />
<li><sup>[https://github.com/tensorflow/models/tree/master/resnet [7]]</sup> Tensor flow models repository. https://github.com/tensorflow/models/tree/master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41142Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T06:58:59Z<p>J23ngo: Undo revision 41132 by Y89dong (talk)</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
Deep Neural Networks can be made resistant to adversarial attacks. <br />
<br />
We can design reliable adversarial training methods through a surprisingly regularly structured saddle point optimization problem. Even though the overall problem relates to a maximization of a non-concave function with varying maxima, these maxia are highly concentrated. Truly robust accurate adversarial models are possible after further exploration.<br />
<br />
== References ==<br />
<ul><br />
<li><sup>[https://openreview.net/forum?id=rJzIBfZAb [1]]</sup> Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.<br />
<li><sup>[https://arxiv.org/pdf/1607.04311.pdf [2]]</sup> Carlini, N., Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311.<br />
<li><sup>[https://arxiv.org/pdf/1608.04644.pdf [3]]</sup> Carlini, N., Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 39-57). IEEE.<br />
<li><sup>[https://arxiv.org/pdf/1512.03385.pdf [4]]</sup> He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).<br />
<li><sup>[https://arxiv.org/pdf/1705.07204.pdf [5]]</sup> Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P. (2017). Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204.<br />
<li><sup>[https://arxiv.org/pdf/1412.06572.pdf [6]]</sup> Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.<br />
<li><sup>[https://github.com/tensorflow/models/tree/master/resnet [7]]</sup> Tensor flow models repository. https://github.com/tensorflow/models/tree/master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41114Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T04:49:46Z<p>J23ngo: /* 6. Conclusions */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
Neural Networks can be made resistant to adversarial attacks. We can design reliable adversarial training methods through a surprisingly regularly structured saddle point optimization problem. Even though the overall problem relates to a maximization of a non-concave function with varying maxima, these maxia are highly concentrated. Truly robust accurate adversarial models are possible after further exploration.<br />
<br />
== References ==<br />
* Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41111Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T04:28:54Z<p>J23ngo: /* Presented by */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41095Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:20:43Z<p>J23ngo: /* References */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41094Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:19:22Z<p>J23ngo: /* Presented by */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41093Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:19:00Z<p>J23ngo: /* Outer minimization */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41092Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:18:41Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41091Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:18:20Z<p>J23ngo: /* Presented by */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurray<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41090Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:18:10Z<p>J23ngo: /* Inner maximization */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41089Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:17:55Z<p>J23ngo: /* 3. Adversarially Robust Networks */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41088Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:17:25Z<p>J23ngo: /* 5. First-Order Adversaries */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41087Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:17:09Z<p>J23ngo: /* References */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary<sup>[[#References|[2]]]</sup>. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness<sup>[[#References|[2]]]</sup>.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC. https://openreview.net/forum?id=rJzIBfZAb</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41086Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:16:52Z<p>J23ngo: /* Transfer attacks */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary<sup>[[#References|[2]]]</sup>. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness<sup>[[#References|[2]]]</sup>.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41085Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:16:27Z<p>J23ngo: /* 5. First-Order Adversaries */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;section 6<br />
;proofread all sections<br />
;citations (APA)<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramèr et al. (2017), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary<sup>[[#References|[2]]]</sup>. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness<sup>[[#References|[2]]]</sup>.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier<sup>[[#References|[2]]]</sup>. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences<sup>[[#References|[2]]]</sup>. Thus, first-order attacks are relevant and are use to train against these attacks<sup>[[#References|[2]]]</sup>. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41081Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:14:28Z<p>J23ngo: /* Outer minimization */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered<sup>[[#References|[2]]]</sup>. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016b.<br />
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.<br />
* Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017a.<br />
* Tensor flow models repository. https://github.com/tensorflow/models/tree/ master/resnet, 2017.<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41079Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:12:59Z<p>J23ngo: /* Inner maximization */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 2.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41078Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:12:42Z<p>J23ngo: /* Outer minimization */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 3.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41077Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:12:21Z<p>J23ngo: /* 3. Adversarially Robust Networks */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack <sup>[[#References|[2]]]</sup>. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD)<sup>[[#References|[2]]]</sup>. Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima)<sup>[[#References|[2]]]</sup>. However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math><sup>[[#References|[2]]]</sup>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|'''Figure 3.''' Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><sup>[[#References|[2]]]</sup><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust<sup>[[#References|[2]]]</sup>.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41076Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:10:27Z<p>J23ngo: /* References */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41075Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:08:19Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[[#References|[2]]]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[[#References|[1]]]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks<sup>[[#References|[2]]]</sup>. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Carlini, N. and Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41074Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:06:17Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks<sup>[2]</sup>. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries<sup>[2]</sup>.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Carlini, N. and Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41073Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:04:20Z<p>J23ngo: /* References */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[1]</sup> Carlini, N. and Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[2]]</sup> Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41070Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T03:01:40Z<p>J23ngo: /* References */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|'''Figure 3.''' Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|'''Table 1.''' MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|'''Table 2.''' CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|'''Figure 4.''' Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
<br />
== 7. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[1]]</sup>Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.<br />
* <sup>[2]</sup>Carlini, N. and Wagner, D. (2016). Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016a</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41064Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T02:57:17Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
===Training===<br />
<br />
We put into practice our robust classification framework with the MNIST and CIFAR10 datasets, with a projected gradient descent (PGD) training adversary, chosen because it effectively produces almost-maximal loss. We start PGD at a randomly chosen perturbation of the natural data. For both datasets, we see a sustained decrease in training loss when networks are trained against the PGD adversary, which suggests that adversarial loss is effectively being minimized.<br />
<br />
[[file:4_trainingloss.png|centre|frame|Trajectory of cross-entropy loss when training with a projected gradient descent (PGD) adversary.]]<br />
<br />
===Testing===<br />
<br />
The trained models are then tested based on a variety of adversaries.<br />
<br />
The source networks for the adversarial attacks are:<br />
* the network itself, denoted by source A (white-box attacks).<br />
* an independently trained version of the original network, denoted by source A' (black-box attacks).<br />
* a version of the original network trained on natural data only, denoted by Anat (black-box attacks).<br />
* a network with a different convolutional architecture, as described by Tramer et al. (2017a), denoted by B (black-box attacks).<br />
<br />
The attack methods used are:<br />
* Fast Gradient Sign Method (denoted FGSM)<br />
* Projected gradient descent (denoted PGD) with various steps and restarts<br />
* Attacks based on literature by Carlini & Wagner (2016b), denoted by CW; a version of this attack with a high confidence parameter is denoted by CW+<br />
* Targeted????<br />
<br />
===MNIST: Network structure and results===<br />
For the MNIST model, the network is composed of two convolutional layers (with 32 and 64 filters) and a fully connected 1024-unit layer. Each convolutional layer is followed by <math>2 \times 2</math> max-pooling layers. The training adversary used is PGD with 40 steps and a step size of 0.01 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=0.3</math>. The adversarially trained model was evaluated to be highly robust; testing results are given in Table 1.<br />
<br />
[[file:4_MNIST.png|centre|frame|Table 1. MNIST dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=0.3</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===CIFAR10: Network structure and results===<br />
For the CIFAR10 model, the network architectures used are the Resnet model (as per He et al (2016); TFM (2017)) and a version of Resnet with <math>10 \times</math> wider layers. The training adversary is PGD with 7 steps and a step size of 2 in the <math>l_\infty</math>-norm, with perturbation size <math>\varepsilon=8</math> total. <br />
The most challenging testing adversary (white-box PGD) used the same architecture as above with 20 steps. Th adversarially trained model performed well given the strength of the testing adversaries but struggled in some cases, with test accuracy close to or lower than 50%. Testing results are given in Table 2.<br />
<br />
[[file:4_CIFAR.png|centre|frame|Table 2. CIFAR10 dataset: Adversarially-trained network performance against various adversaries with <math>\varepsilon=8</math>; most successful attacks are bolded for each attack model.]]<br />
<br />
===Further experiments: <math>\varepsilon</math> values and <math>l_2</math>-bounded attacks===<br />
<br />
The MNIST model adversarially trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 0.3</math> and the CIFAR10 model trained against <math>l_\infty</math>-bounded attacks with <math>\varepsilon = 8</math> are further evaluated against attacks of varying <math>\varepsilon</math> values, for both <math>l_\infty</math>-bounded and <math>l_2</math>-bounded attacks. The MNIST network performed especially well against <math>l_2</math>-bounded adversaries.<br />
<br />
[[file:4_parameters.png|centre|frame|Adversarially-trained MNIST and CIFAR10 network performance tested against PGD adversaries of varying <math>\varepsilon</math> values, for <math>l_\infty</math> and <math>l_2</math> norms. <math>\varepsilon</math> value used for training is indicated in red.]]<br />
<br />
===Training accuracy===<br />
<br />
The MNIST network and wide-layer version of the CIFAR10 network had 100% accuracy on the training sets; thus, the optimization problem is tractable.<br />
<br />
===Running time===<br />
<br />
As PGD is iterative, training against a <math>k</math>-step PGD adversary requires <math>k</math> forward and backward passes per training batch rather than the single forward and backward pass for standard network training. This poses a challenge for computation as the the running time is greater by a factor of <math>k+1</math>.<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
<br />
== 7. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[1]]</sup>Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41046Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T00:53:26Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572 Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
<br />
== 7. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[1]]</sup>Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41045Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T00:52:33Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.<sup>[https://arxiv.org/abs/1412.6572|Source]</sup>]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
<br />
== 7. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[1]]</sup>Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41042Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T00:46:11Z<p>J23ngo: </p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. <sup>[[#References|[1]]]</sup><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
<br />
== 7. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[1]]</sup>Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=41041Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-23T00:45:14Z<p>J23ngo: </p>
<hr />
<div>This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu<br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
'''TO DO:'''<br />
;sections 4, 6, 7<br />
;proofread all sections<br />
;figure numbers/citations for images<br />
;APA citations for the paper, any other works referred to, and images<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is to solve:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math></div><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around <math>x</math>, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math></div><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data <math>x</math> that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
<br />
== 7. Conclusions ==<br />
<br />
== References ==<br />
* <sup>[https://openreview.net/forum?id=rJzIBfZAb[1]]</sup>Madry et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the ICLR 2018 Conference, Vancouver, BC.</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40854Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T08:08:38Z<p>J23ngo: /* Transfer attacks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant and are use to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40853Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T08:07:40Z<p>J23ngo: /* Transfer attacks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40851Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T08:07:02Z<p>J23ngo: /* Transfer attacks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without its gradient, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40850Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T08:05:26Z<p>J23ngo: /* Inner Maximization */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> and tens to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40818Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T06:09:10Z<p>J23ngo: /* Inner Maximization */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|Adversarial loss of MNIST and CIFAR10 from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40816Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T06:07:54Z<p>J23ngo: /* 5. First-Order Adversaries */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|The plots show the adversarial loss of the MNIST and CIFAR10 evaluation from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values are not found even after a large amount of restarts.<br />
<br />
The authors also experimentally find that a network trained to be robust against PGD adversaries also become robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper finds that increasing network capacity and the strength of the adversary that the network is trained against improves resistance to transfer attacks. The paper also finds that the models that are resistant to first-order attacks are more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40813Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T06:05:56Z<p>J23ngo: /* 3. Adversarially Robust Networks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|The plots show the adversarial loss of the MNIST and CIFAR10 evaluation from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations is assumed to not be an issue as the set of discontinuities has a measure of zero, and thus these points of discontinuity will never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values were not found even after a large amount of restarts.<br />
<br />
The authors also experimentally found that a network trained to be robust against PGD adversaries also became robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper found that increasing network capacity and the adversary strength that the network is trained against improved resistance to transfer attacks. The paper also found that the models that were resistant to first-order attacks were more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40809Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T06:03:37Z<p>J23ngo: /* Inner Maximization */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
[[file:adversarial loss of MNST.png|centre|frame|The plots show the adversarial loss of the MNIST and CIFAR10 evaluation from 20 runs of PGD starting at random points in the <math>\ell_\infty</math>-ball around the same benign example.]]<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values were not found even after a large amount of restarts.<br />
<br />
The authors also experimentally found that a network trained to be robust against PGD adversaries also became robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper found that increasing network capacity and the adversary strength that the network is trained against improved resistance to transfer attacks. The paper also found that the models that were resistant to first-order attacks were more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:adversarial_loss_of_MNST.png&diff=40805File:adversarial loss of MNST.png2018-11-22T05:55:34Z<p>J23ngo: </p>
<hr />
<div></div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40804Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:51:25Z<p>J23ngo: /* Outer Minimization */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, the problem becomes:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values were not found even after a large amount of restarts.<br />
<br />
The authors also experimentally found that a network trained to be robust against PGD adversaries also became robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper found that increasing network capacity and the adversary strength that the network is trained against improved resistance to transfer attacks. The paper also found that the models that were resistant to first-order attacks were more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40803Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:50:12Z<p>J23ngo: /* 5. First-Order Adversaries */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, we have the problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, it is unlikely that any other first-order adversary will yield a local maxima with a significantly lower final loss value than the PGD adversary. This is seen experimentally: other adversaries with significantly lower loss values were not found even after a large amount of restarts.<br />
<br />
The authors also experimentally found that a network trained to be robust against PGD adversaries also became robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods, suggest that using the saddle point formulation to adversarially train networks will encompass all current approaches that attempt to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without the gradient of the network, it can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper found that increasing network capacity and the adversary strength that the network is trained against improved resistance to transfer attacks. The paper also found that the models that were resistant to first-order attacks were more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40802Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:45:56Z<p>J23ngo: /* 5. First-Order Adversaries */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, we have the problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
Since all local maxima founded by PGD tended to have similar loss values, this suggests that it is very unlikely that any other first-order adversary will yield a local maxima with a significantly lower loss value than PGD. This is seen experimentally: other adversaries with significantly lower loss values were not found even after a large amount of restarts.<br />
<br />
The authors also experimentally found that a network trained to be robust against PGD adversaries also became robust against various other first-order adversaries. This result, along with the fact that the vast majority of optimization problems in machine learning are solved using first-order methods suggest that the saddle point formulation to adversarially train networks will encompass all current approaches to achieve adversarial robustness.<br />
<br />
===Transfer attacks===<br />
Transfer attacks are adversarial attacks that do not have direct access to the classifier. Although these attacks evaluate the classifier without gradient feedback, the gradient of the network can be estimated with finite differences. Thus, first-order attacks are relevant to use in order to train against these attacks. The paper found that increasing network capacity and the adversary strength that the network is trained against improved resistance to transfer attacks. It also found that the models that were resistant to first-order attacks were more resistant to transfer attacks.<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40781Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:15:47Z<p>J23ngo: /* 3. Adversarially Robust Networks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values for both normally and adversarially trained networks.<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, we have the problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40773Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:05:34Z<p>J23ngo: /* Contributions */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent and stochastic gradient descent, despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values.<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, we have the problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40772Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:05:00Z<p>J23ngo: /* 3. Adversarially Robust Networks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent (PGD) and stochastic gradient descent (SGD), despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values.<br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, we have the problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40769Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T05:00:56Z<p>J23ngo: /* 3. Adversarially Robust Networks */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent (PGD) and stochastic gradient descent (SGD), despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
Obtaining the outer minimization of the saddle point formulation <br />
<math>\min_{\theta} \rho (\theta)</math><br />
guarantees that all perturbations result in smallest adversarial loss possible based on the attack. Stochastic gradient descent (SGD) is used to solve the outer minimization problem by obtaining the gradients <math>\nabla_\theta \rho(\theta)</math>, which are computed at the maximizer of the inner maximization problem.<br />
<br />
Solutions for the inner maximization are used as known adversarial attacks on the neural network. <br />
<br />
=== Outer Minimization ===<br />
Since distribution <math>D</math> is unknown, the gradients and <math>\rho(\theta)</math> are obtained using sampled input points, i.e. for the case of a single random example <math>x</math> with label <math>y</math>, we have the problem:<br />
<br />
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>\min_\theta [max_{\delta \in S} L(\theta, x+\delta,y)]</math></div><br />
<br />
<br />
<math>L</math> is assumed to be continuously differentiable for <math>\theta</math>, and Danskin’s theorem is used to obtained the direction of descent. Discontinuity when using ReLU activations was assumed to not be an issue as the set of discontinuities had a measure of zero, and thus these points of discontinuity would never be encountered. Using the gradients <math>\nabla_\theta \rho(\theta)</math> obtained with SGD, the loss of the saddle point problem is reduced during training. Thus, the saddle point optimization problem can be solved and classifiers can be trained to be adversarially robust.<br />
<br />
=== Inner Maximization ===<br />
Based on prior research, the maximizers <math>\delta \in S</math> of <math>\rho(\delta)</math> are obtained by using projected gradient descent (PGD). Since the inner maximization is non-concave, the global maximizer cannot be found using PGD (instead, PGD will converge to local maxima). However, the paper’s findings conclude that PGD can be used in practice to solve the inner maximization problem using first-order methods. <br />
<br />
Applying Danskin’s theorem on a subset <math>S' \in S</math> where the local maximum is the global maximum of the region <math>S'</math> gives the gradient corresponding to the descent direction for the saddle point problem when the adversary is constrained to <math>S'</math>. The authors also find that the local maxima are widely spread apart within <math>x_i+S</math> but tended to have similar loss values.<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40761Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T04:27:51Z<p>J23ngo: </p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent (PGD) and stochastic gradient descent (SGD), despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View on Adversarial Robustness ==<br />
<br />
Consider a standard classification task on data <math>x \in \mathbb{R}^d</math> and labels <math>y \in [1,...,k]</math>, assumed to have underlying distribution <math>D</math>. Based a given loss function <math>L(\theta,x,y)</math> (e.g. the cross-entropy loss), the goal is to find the model parameters <math>\theta \in \mathbb{R}^p</math> that minimize the empirical risk of misclassification. In other words, the problem is:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [L(\theta,x,y)]</math><br />
<br />
However, empirical risk minimization may not result in models that are robust against adversaries. To produce robust models, we instead define conditions that the model must satisfy and train the model according to those conditions. We describe the set of attacks that our model must be robust against by considering a set of perturbations <math>S \subseteq \mathbb{R}^d</math> for each data point <math>x</math>. This gives a set of data points with similar properties to <math>x</math>, for example, the <math>l_{\infty}</math>-ball around x, which we focus on in this paper.<br />
<br />
We then define an augmented classification algorithm based on perturbed input data rather than the natural data. Our goal then becomes finding the model parameters <math>\theta \in \mathbb{R}^p</math> which minimize the expected loss calculated on perturbed data. This results in a saddle point problem:<br />
<br />
<math>\min_{\theta} \mathbb{E}_{(x,y) \sim D} [\max_{\delta \in S} L(\theta,x+\delta,y)]</math><br />
<br />
We rewrite this as <math>\min_{\theta} \rho (\theta)</math> where <math>\rho (\theta)</math> is the ''adversarial loss'' of the network; this problem thus becomes the primary topic of this paper. The saddle point problem has two components:<br />
# Inner maximization: <math>\qquad \max_{\delta \in S} L(\theta,x+\delta,y)</math> <br /> This involves finding the adversarial version of input data x that results the highest loss, which is equivalent to building an attack against the network.<br />
# Outer minimization: <math>\qquad \min_{\theta} \rho (\theta)</math> <br /> This involves finding model parameters that minimize the above adversarial loss, i.e. training the network to be adversarially robust.<br />
<br />
== 3. Adversarially Robust Networks ==<br />
<br />
== 4. Experiments ==<br />
Amy<br />
<br />
== 5. First-Order Adversaries ==<br />
<br />
== 6. Related Work ==<br />
Amy</div>J23ngohttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Towards_Deep_Learning_Models_Resistant_to_Adversarial_Attacks&diff=40745Towards Deep Learning Models Resistant to Adversarial Attacks2018-11-22T02:26:22Z<p>J23ngo: /* 1. Introduction */</p>
<hr />
<div><br />
== Presented by ==<br />
* Yongqi Dong<br />
* Aden Grant<br />
* Andrew McMurry<br />
* Jameson Ngo<br />
* Baizhi Song<br />
* Yu Hao Wang<br />
* Amy Xu<br />
<br />
== 1. Introduction ==<br />
<br />
[[File:adversarial attack example.png|center|frame|Before and after an input image of a panda was subjected to perturbations.]]<br />
<br />
Any classifier can be tricked into giving the incorrect result. When an input is specifically designed to do this, it is called an adversarial attack. This can be done by injecting a set of perturbations to the input. These attacks are a prominent challenge for classifiers that are used for image processing and security systems because small changes to the input values that are imperceptible to the human eye can easily fool high-level neural networks. As such, resistance to adversarial attacks has become increasingly important for classifiers to have.<br />
<br />
While there have been many approaches to defend against adversarial attacks, we can never be certain that these defenses will be able to robust against broad types of adversaries. Furthermore, these defenses can be evaded by stronger and adaptive adversaries.<br />
<br />
=== Contributions ===<br />
<br />
This paper uses robust optimization to explore adversarial robustness of neural networks. The authors conduct an experimental study of the saddle-point formulation which is used for adversarially training. The authors propose that we can reliably solve the saddle-point optimization problem using first-order methods, particularly project gradient descent (PGD) and stochastic gradient descent (SGD), despite the problem's non-convexity and non-concavity. They explore how network capacity affects adversarial robustness and conclude that networks need larger capacity in order to be resistant to strong adversaries. Lastly, they train adversarially robust networks on MNIST and CIFAR10 using the saddle point formulation.<br />
<br />
== 2. An Optimization View of Adversarial Robustness ==<br />
Amy<br />
== 4. Experiments ADVERSARIALLY ROBUST DEEP LEARNING MODELS? ==<br />
Amy<br />
== 6. Related Works ==<br />
Amy</div>J23ngo