memory Networks

From statwiki
Revision as of 22:41, 12 November 2015 by Pblouw (talk | contribs) (Added Introduction)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  • Content in progress*

Introduction

Most supervised machine learning models are designed to approximate a function that maps input data to a desirable output (e.g. a class label for an image or a translation of a sentence from one language to another). In this sense, such models perform inference using a 'fixed' memory in the form of a set of parameters learned during training. For example, the memory of a recurrent neural network is constituted largely by the weights on the recurrent connections to its hidden layer (along with the layer's activities). As is well known, this form of memory is inherently limited given the fixed dimensionality of the weights in question. It is largely for this reason that recurrent nets have difficulty learning long-range dependencies in sequential data. Learning such dependencies, note, requires *remembering* the items in a sequence for a large number of time steps.

For an interesting class of problems, it is essential for a model to be able to learn long-term dependencies, and to more generally be able to learn to perform inferences using an arbitrarily large memory. Question-answering tasks are paradigmatic of this class of problems, since performing well on such tasks requires remembering all of the information that constitutes a possible answer to the questions being posed. In principle, a recurrent network such as an LSTM could learn to perform QA tasks, but in practice, the amount of information that can be retained by the weights and the hidden states in the LSTM is simply insufficient.

Given this need for a model architecture the combines inference and memory in a sophisticated manner, the authors of this paper propose what they refer to as a "Memory Network". In brief, a memory network is a model that learns to read and write data to an arbitrarily large long-term memory, while also using the data in this memory to perform inferences. The rest of this summary describes the components of a memory network in greater detail.

Model Architecture

A Basic Implementation

The Training Procedure

= Some Extensions to the Basic Implementation

Related work

Experimental Results

Discussion

Resources

References