stat946F18/differentiableplasticity
Differentiable Plasticity
Presented by
1. Ganapathi Subramanian, Sriram
Motivation
1. Neural Networks which is the basis of the modern artificial intelligence techniques, is static in nature in terms of architecture. Once a Neural Network is trained the network architecture components (ex. network connections) cannot be changed and thus effectively learning stops with the training step. If a different task needs to be considered, then the agent must be trained again from scratch.
2. Plasticity is the characteristic of biological systems like humans, which is capable of changing the network connections over time. This enables lifelong learning in biological systems and thus is capable of adapting to dynamic changes in the environment with great sample efficiency in the data observed. This is called synaptic plasticity which is based on the Hebb's rule i.e. If a neuron repeatedly takes part in making another neuron fire, the connection between them is strengthened.
3. Differential plasticity is a step in this direction. The plastic connections' behavior is trained using gradient descent so that the previously trained networks can adapt to changing conditions.
Example: Using the current state of the art supervised learning examples, we can train Neural Networks to recognize specific letters that it has seen during training. Using lifelong learning the agent can know about any alphabet, including those that it has never been exposed to during training.
Objectives
The paper has the following objectives:
1. To tackle to problem of meta-learning (learning to learn).
2. To design neural networks with plastic connections with a special emphasis on gradient descent capability for backpropagation training.
3. To use Backpropagation to optimize both the base weights and the amount of plasticity in each connection.
4. To demonstrate the performance of such networks on three complex and different domains namely complex pattern memorization, one shot classification and reinforcement learning.
Related Work
Previous Approaches to solve this problem:
1) Train standard recurrent neural networks to incorporate past experience in their future responses within each episode. For the learning abilities, the RNN is attached with an external content-addressable memory banks. An attention mechanism within the controller network does the read write to the memory bank and thus enables fast memorization. 2) Another approach: Augment each weight with a plastic component that automatically grows and decays as a function of inputs and outputs. All connection have the same non trainable plasticity and only the corresponding weights are trained. Recent approaches have tried fast-weights which augments recurrent networks with fast changing Hebbian weights and computes activations at each step. The network has a high bias towards the recently seen patterns. 3) The other approach is to optimize the learning rule itself instead of the connections. A parametrized learning rule is used where the structure of the network is fixed before hand. 4) Another method involves having all the weight updates to be computed on the fly by the network itself or by a separate network at each time step. Pros is the flexibility and the cons are the large learning burden placed on the network. 5) Another approach performs gradient descent via propagation during the episode. The meta learning involves training the base network for it to be fine tuned using additional gradient descent. 6) For classification tasks a separate embedding is trained to discriminate between different classes. Classification is then a comparison between the embedding of the test and example instances.