# Introduction

The recent burgeon on the use of Deep Neural Networks (DNNs) have resulted in giant leaps of accuracy in prediction. They are also being used to solve a variety of complex tasks which earlier methodologies have struggled to excel in.

While it is all good to see incredibly high accuracy as a result of the use of DNN, we must begin to question why they perform so well. It has become an interesting field of study to actually represent the features/feature maps or interpret the meaning of the learnt values in a DNN's hidden layers. Currently we treat models of DNNs as black boxes which we practically tune the tweakable parameters like number of layers, number of units in each layer, number & size of feature maps(in case of CNN) etc. The opacity created by the lack of an intuitive representation of the internal learnt parameters of DNNs hinders both basic research as well as its application to real world problems.

Recent pushes have aimed to better understand DNNs: tailor-made loss functions and architectures produce more interpretable features (Higgins et al., 2016; Raposo et al., 2017) while output-behavior analyses unveil previously opaque operations of these networks (Karpathy et al., 2015). Parallel to this work, neuroscience-inspired methods such as activation visualization (Li et al., 2015), ablation analysis (Zeiler & Fergus, 2014) and activation maximization (Yosinski et al., 2015) have also been applied

This paper aims to provide another methodology to attempt to decipher & better understand how DNNs solve a particular task. This methodology was inspired by psychological concepts to test whether the DNN's were able to make accurate predictions with biases similar to that the human mind makes.

Research in developmental psychology shows that when learning new words, humans tend to assign the same name to similarly shaped items rather than to items with similar color, texture, or size. This bias/knowledge tend to be forged into the brains of humans and humans then take this forward to easily associate these shapes with new objects they have not seen before.

The authors of this paper try to simulate if DNNs behave similarly in one-shot learning applications. They attempt to prove that when the models of state-of-the-art DNNs are used to learn objects from images, they exhibit a stronger shape bias than a color bias. To emulate the human brain, they use the parameters of pre-trained DNN models and use this to perform one-shot learning on a new data set with different labels.

# Background

## One Shot Learning

One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images.

The one-shot word learning task is to label a novel data example $\hat{x}$ (e.g. a novel probe image) with a novel class label $\hat{y}$ (e.g. a new word) after only a single example.

More specifically, given a support set $S = {(x_i, y_i) , i \in [1, k]}$, of images $x_i$, and their associated labels $y_i$, and an unlabeled probe image $\hat{x}$, the one-shot learning task is to identify the true label of the probe image, $\hat{y}$, from the support set labels ${y_i , i \in [1, k]}$:

$\displaystyle \hat{y} = arg \max_{y}$ $P(y | \hat{x}, S)$

We assume that the image labels $y_i$ are represented using a one-hot encoding and that $P(y|\hat{x}, S)$ is parameterised by a DNN, allowing us to leverage the ability of deep networks to learn powerful representations.

## Inception Networks

A probe image $\hat{x}$ is given the label of the nearest neighbour from the support set:

$\hat{y} = y$

$(x, y) = \displaystyle arg \min_{(x_i,y_i) \in S} d(h(x_i), h(\hat{x}))$

where d is a distance function.

The function h is parameterized by Inception – one of the best performing ImageNet classification models. Specifically, h returns features from the last layer (the softmax input) of a pre-trained Inception classifier. With these features as input and cosine distance as the distance function, the classifier in achieves 87.6% accuracy on one-shot classification on the ImageNet dataset (Vinyals et al., 2016). We call the Inception classifier together with the nearest-neighbor component the Inception Baseline (IB) model.

## Matching Networks

MNs (Vinyals et al.,2016) are neural network architectures with state-of-the-art one shot learning performance on ImageNet (93.2% one-shot labelling accuracy). MNs are trained to assign label $\hat{y}$ to probe image $\hat{x}$ using an attention mechanism a acting on image embeddings stored in the support set S:

where d is a cosine distance and where f and g provide context-dependent embeddings of $\hat{x}$ and $x_i$ (with contextS). The embedding $g(x_i, S)$ is a bi-directional LSTM (Hochreiter & Schmidhuber, 1997) with the support set S provided as an input sequence. The embedding $f(\hat{x}, S)$ is an LSTM with a read-attention mechanism operating over the entire embedded support set. The input to the LSTM is given by the penultimate layer features of a pre-trained deep convolutional network, specifically Inception.

The training procedure for the one-shot learning task is critical if we want MNs to classify a probe image $\hat{x} after viewing only a single example of this new image class in its support set (Hochreiter et al., 2001; Santoro et al., 2016). To train MNs we proceed as follows: ### Training MN (1) At each step of training, the model is given a small support set of images and associated labels. In addition to the support set, the model is fed an unlabeled probe image$\hat{x}$(2) The model parameters are then updated to improve classification accuracy of the probe image$\hat{x}$given the support set. Parameters are updated using stochastic gradient descent with a learning rate of 0.1 (3) After each update, the labels${(y_i, i \in [1, k]}\$ in the training set are randomly re-assigned to new image classes (the label indices are randomly permuted, but the image labels are not changed). This is a critical step. It prevents MNs from learning a consistent mapping between a category and a label. Usually, in classification, this is what we want, but in one-shot learning we want to train our model for classification after viewing a single in-class example from the support set.

The objective function used is:

where T is the set of all possible labelings of our classes, S is a support set sampled with a class labeling C ~ T and B is a batch of probe images and labels, also with the same randomly chosen class labeling as the support set.