Difference between revisions of "STAT946F17/ Learning Important Features Through Propagating Activation Differences"

From statwiki
Jump to: navigation, search
m (References)
m (References)
Line 25: Line 25:
  
 
== References ==
 
== References ==
[1] Shrikkumar, A., Greenside, P., and Kundaje, A. Learning Important Features Through Propagating Activation Differences. arXiv:1704.02685
+
[1] Shrikumar, A., Greenside, P., and Kundaje, A. Learning Important Features Through Propagating Activation Differences. arXiv:1704.02685

Revision as of 00:35, 27 October 2017

This is a summary of ICML 2017 paper [1].

Introduction

Deep neuron network is purported for its "black box" nature which is a barrier to adoption in applications where interpretability is essential. Also, the "black box" nature brings difficulty for analyzing and improving the structure of the model. In our topic paper, DeepLIFT method is presented to decompose the output of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. This is a form of sensitivity analysis and helps understand the model better.

Sensitivity Analysis

Sensitivity Analysis is a concept in risk management and actuarial science. According to [Invectopedia], a sensitivity analysis is a technique used to determine how changes in an independent variable influence a particular dependent variable under given assumptions. This technique is used within specific boundaries that depend on one or more input variables, such as the effect that changes in interest rates have on bond prices.

In our topic, we have a well-trained deep neuron network with two high-dimensional input vectors $x_0, x_1$ and output $y_0=f(x_0), y_1=f(x_1)$. Now we know $x_1$ is a perturbation of $x_0$ and we want to know which element in $x_1 - x_0$ contributes the most to $y_1 - y_0$.

As one can imagine, if $\left| x_1 - x_0 \right|$ is small the most "crude" method to approximate is to calculate

$\left . \frac{\partial y}{\partial x} \right|_{x = x_0} $

and get its largest element in terms of absolute value. This is well feasible because back-propagation enables us to calculate the differentials layer by layer. However, this method doesn't always work well.

Failure of traditional methods

to be done

DeepLIFT scheme

to be done

Numerical results

to be done

References

[1] Shrikumar, A., Greenside, P., and Kundaje, A. Learning Important Features Through Propagating Activation Differences. arXiv:1704.02685