A Game Theoretic Approach to Class-wise Selective Rationalization
Presented by
Yushan Chen, Yuying Huang, Ankitha Anugu
Introduction
The selection of input features can be optimised for trained models or can be directly incorporated into methods. But an overall selection does not properly capture the useful rationales. So, a new game theoretic approach to class-dependent rationalization is introduced where the method is specifically trained to highlight evidence supporting alternative conclusions. Each class consists of three players who compete to find evidence for both factual and counterfactual circumstances. In a simple example, we explain how the game drives the solution towards relevant class-dependent rationales theoretically.
Previous Work
There are two directions of research on generating interpretable features of neural networks. The first is to include the interpretations directly in the models, often known as self-explaining models. The alternative option is to generate interpretations in post-hoc manner. There are also few research works attempting to increase the fidelity of post hoc explanations by including the explanation mechanism into the training approach. Although none of these works can perform class-wise rationalisation, gradient based methods can be intuitively modified for this purpose, generating explanations for a specific class by probing the importance with regard to the relevant class logit.
Motivation
Extending how rationales are defined and calculated is one of the primary questions motivating this research. To date, the typical approach has been to choose an overall feature subset that best explains the output/decision. The greatest mutual information criterion, for example, selects an overall subset of features so that mutual information between the feature subset and the target output decision is maximised, or the entropy of the target output decision conditional on this subset is minimised. Rationales, on the other hand, can be multi-faceted, involving support for a variety of outcomes in varying degrees. Existing rationale algorithms have one limitation. They only look for rationales that support the label class. So, we propose CAR algorithm to find rationales of any given class.
Model Architecture
[WIP]
Experiments
The authors evaluated factual and counterfactual rational generation in both single- and multi-aspect classification tasks. The method was tested on the following three binary classification datasets: Amazon reviews (single-aspect) [1], Beer reviews (multi-aspect) [2], and Hotel reviews (multi-aspect) [3]. Note that the Amazon reviews contain both positive and negative sentiments. Beer reviews and hotel reviews contain only factual annotations.
The CAR model is compared with two existing methods, RNP and Post-exp.
- RNP: this is a framework for rationalizing neural prediction proposed by Lei et al. [4]. It combines two modular components, a generator, and a predictor. RNP is only able to generate factual rationales.
- POST-EXP: Post-explanation approach contains two generators and one predictor. Given the pre-trained predictor, one generator is trained to generate positive rationales, and the other generator is trained to generate negative rationales.
To make reasonable comparisons among the three methods, the predictors are of the same architecture, so as the generators. Their sparsity and continuity constraints are also of the same form.
Two types are experiments are set: objective evaluation and subjective evaluation.
In the objective evaluation, rationales are generated using the three models, and are compared with human annotations. Precision, recall and F1 score are reported to evaluate the performance of the models. (Note that the algorithms are conditioned on a similar actual sparsity level in factual rationales.)
In the subjective evaluations, people were asked to choose a sentiment (positive or negative) based on the given rationales. In the single-aspect case, a success is credited when the person correctly guesses the ground-truth sentiment given the factual rationales, or the person is convinced to choose the opposite sentiment to the ground-truth given the counterfactual rationale. In the multi-aspect case, the person also needs to guess the aspect correctly.
Conclusion
Critiques
-
References
[1] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. CoRR, abs/1312.4400, 2013.
[2] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Conference on, 2014.
[3] Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. CoRR, abs/1310.6343, 2013.
[4] ¨Umit V. C¸ ataly¨urek, Cevdet Aykanat, and Bora Uc¸ar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656–683, February 2010.
Footnote 1: Hebbian theory is a neuroscientific theory claiming that an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. It is an attempt to explain synaptic plasticity, the adaptation of brain neurons during the learning process.
Footnote 2: For more explanation on 1 by 1 convolution refer to: https://iamaaditya.github.io/2016/03/one-by-one-convolution/