# Introduction

The recent burgeon on the use of Deep Neural Networks (DNNs) have resulted in giant leaps of accuracy in prediction. They are also being used to solve a variety of complex tasks which earlier methodologies have struggled to excel in.

While it is all good to see incredibly high accuracy as a result of the use of DNN, we must begin to question why they perform so well. It has become an interesting field of study to actually represent the features/feature maps or interpret the meaning of the learnt values in a DNN's hidden layers. Currently we treat models of DNNs as black boxes which we practically tune the tweakable parameters like number of layers, number of units in each layer, number & size of feature maps(in case of CNN) etc. The opacity created by the lack of an intuitive representation of the internal learnt parameters of DNNs hinders both basic research as well as its application to real world problems.

Recent pushes have aimed to better understand DNNs: tailor-made loss functions and architectures produce more interpretable features (Higgins et al., 2016; Raposo et al., 2017) while output-behavior analyses unveil previously opaque operations of these networks (Karpathy et al., 2015). Parallel to this work, neuroscience-inspired methods such as activation visualization (Li et al., 2015), ablation analysis (Zeiler & Fergus, 2014) and activation maximization (Yosinski et al., 2015) have also been applied

This paper aims to provide another methodology to attempt to decipher & better understand how DNNs solve a particular task. This methodology was inspired by psychological concepts to test whether the DNN's were able to make accurate predictions with biases similar to that the human mind makes.

Research in developmental psychology shows that when learning new words, humans tend to assign the same name to similarly shaped items rather than to items with similar color, texture, or size. This bias/knowledge tend to be forged into the brains of humans and humans then take this forward to easily associate these shapes with new objects they have not seen before.

The authors of this paper try to simulate if DNNs behave similarly in one-shot learning applications. They attempt to prove that when the models of state-of-the-art DNNs are used to learn objects from images, they exhibit a stronger shape bias than a color bias. To emulate the human brain, they use the parameters of pre-trained DNN models and use this to perform one-shot learning on a new data set with different labels.

# Background

## One Shot Learning

One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images.

The one-shot word learning task is to label a novel data example $\hat{x}$ (e.g. a novel probe image) with a novel class label $\hat{y}$ (e.g. a new word) after only a single example.

More specifically, given a support set $S = {(x_i, y_i) , i \in [1, k]}$, of images $x_i$, and their associated labels $y_i$, and an unlabeled probe image $\hat{x}$, the one-shot learning task is to identify the true label of the probe image, $\hat{y}$, from the support set labels ${y_i , i \in [1, k]}$:

$\displaystyle \hat{y} = arg \max_{y}$ $P(y | \hat{x}, S)$

We assume that the image labels $y_i$ are represented using a one-hot encoding and that $P(y|\hat{x}, S)$ is parameterised by a DNN, allowing us to leverage the ability of deep networks to learn powerful representations.

## Inception Networks

A probe image $\hat{x}$ is given the label of the nearest neighbour from the support set:

$\hat{y} = y$

$(x, y) = \displaystyle arg \min_{(x_i,y_i) \in S} d(h(x_i), h(\hat{x}))$

where d is a distance function.

The function h is parameterized by Inception – one of the best performing ImageNet classification models. Specifically, h returns features from the last layer (the softmax input) of a pre-trained Inception classifier. With these features as input and cosine distance as the distance function, the classifier in achieves 87.6% accuracy on one-shot classification on the ImageNet dataset (Vinyals et al., 2016). We call the Inception classifier together with the nearest-neighbor component the Inception Baseline (IB) model.

## Matching Networks

MNs (Vinyals et al.,2016) are neural network architectures with state-of-the-art one shot learning performance on ImageNet (93.2% one-shot labelling accuracy). MNs are trained to assign label $\hat{y}$ to probe image $\hat{x}$ using an attention mechanism a acting on image embeddings stored in the support set S:

where d is a cosine distance and where f and g provide context-dependent embeddings of $\hat{x}$ and $x_i$ (with contextS). The embedding $g(x_i, S)$ is a bi-directional LSTM (Hochreiter & Schmidhuber, 1997) with the support set S provided as an input sequence. The embedding $f(\hat{x}, S)$ is an LSTM with a read-attention mechanism operating over the entire embedded support set. The input to the LSTM is given by the penultimate layer features of a pre-trained deep convolutional network, specifically Inception.

To train MNs we proceed as follows:

### Training MN

• Step 1: At each step of training, the model is given a small support set of images and associated labels. In addition to the support set, the model is fed an unlabeled probe image $\hat{x}$
• Step 2: The model parameters are then updated to improve classification accuracy of the probe image $\hat{x}$ given the support set. Parameters are updated using stochastic gradient descent with a learning rate of 0.1
• Step 3: After each update, the labels ${(y_i, i \in [1, k]}$ in the training set are randomly re-assigned to new image classes (the label indices are randomly permuted,

but the image labels are not changed). This is a critical step. It prevents MNs from learning a consistent mapping between a category and a label. Usually, in classification, this is what we want, but in one-shot learning we want to train our model for classification after viewing a single in-class example from the support set.

The objective function used is:

where T is the set of all possible labelings of our classes, S is a support set sampled with a class labeling C ~ T and B is a batch of probe images and labels, also with the same randomly chosen class labeling as the support set.

# Methodology

## Inductive Biases & Probe Data

Inductive biases are those criteria which are artificially selected or learnt by the network as a classifying/distinguishing property. It has been observed that the biases that DNNs learnt are complex composite features. We, as researchers can take advantage of the fact that DNNs learnt complex distinguishing features by constructing probe data sets which particularly target on exposing a particular bias that a DNN might have. We first take a known composite feature which we suspect the DNNs are biased against, train the model with data, and use the pre-trained model with a new data set which is specifically contain data which are on the opposite sides of the bias we suspect the DNN shows.

## Data Sets Used

• Training Set: ImageNet
• Test Set:
• The Cognitive Psychology Probe Data (CogPsyc data) that is used consists of 150 images of objects. The images are arranged in triples consisting of a probe image, a shape-match image (that matches the probe in colour but not shape), and a color-match image (that matches the probe in shape but not colour). In the dataset there are 10 triples, each shown on 5 different backgrounds, giving a total of 50 triples.
• A real-world dataset consisting of 90 images of objects (30 triples) collected using Google Image Search. The images are arranged in triples consisting of a probe, a shape-match and a colour-match.

# Experiments

## Evaluation Criteria

• For a given probe image $\hat{x}$, we loaded the shape-match image $x_s$ and corresponding label $y_s$, along with the colour-match image $x_c$ and corresponding label $y_c$ into memory, as the support set $S = \{(x_s, ys), (x_c, y_c)\}$
• Calculate $\hat{y}$
• The model assigns either $y_c$ or $y_s$ to the probe image.
• To estimate the shape bias Bs, calculate the proportion of shape labels assigned to the probe: $B_s = E(\delta(\hat{y} - y_s))$

where E is an expectation across probe images and $\delta$ is the Dirac delta function.

## Experiment 1 & Results: Inception Baseline:

• Shape bias of IB to be $B_s = 0.68$. Similarly, the shape bias of IB using our real-world dataset was $B_s = 0.97$. Together, these results strongly suggest that IB trained on ImageNet has a stronger bias towards shape than colour

## Experiment 2 & Results: Matching Network:

• They found that MNs have a shape of bias $B_s = 0.7$ using the CogPsyc dataset and a bias of $Bs = 1$ using the real-world dataset. Once again, these results suggest that MNs trained seeding from Inception using ImageNet has a stronger bias towards shape than colour.