Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

From statwiki
Jump to navigation Jump to search

Introduction & Background

Learning quickly is a hallmark of human intelligence, whether it involves recognizing objects from a few examples or quickly learning new skills after just minutes of experience. In this work, we propose a meta-learning algorithm that is general and model-agnostic, in the sense that it can be directly applied to any learning problem and model that is trained with a gradient descent procedure. Our focus is on deep neural network models, but we illustrate how our approach can easily handle different architectures and different problem settings, including classification, regression, and policy gradient reinforcement learning, with minimal modification. Unlike prior meta-learning methods that learn an update function or learning rule (Schmidhuber, 1987; Bengio et al., 1992; Andrychowicz et al., 2016; Ravi & Larochelle, 2017), this algorithm does not expand the number of learned parameters nor place constraints on the model architecture (e.g. by requiring a recurrent model (Santoro et al., 2016) or a Siamese network (Koch, 2015)), and it can be readily combined with fully connected, convolutional, or recurrent neural networks. It can also be used with a variety of loss functions, including differentiable supervised losses and nondifferentiable reinforcement learning objectives.

The primary contribution of this work is a simple model and task-agnostic algorithm for meta-learning that trains a model’s parameters such that a small number of gradient updates will lead to fast learning on a new task. The paper shows the effectiveness of the proposed algorithm in different domains, including classification, regression, and reinforcement learning problems.

Model-Agnostic Meta Learning (MAML)

The goal of the proposed model is rapid adaptation. This setting is usually formalized as few-shot learning.

Problem set-up

The goal of few-shot meta-learning is to train a model that can quickly adapt to a new task using only a few datapoints and training iterations. To do so. the model is trained during a meta-learning phase on a set of tasks, such that it can then be adapted to a new task using only a small number of parameter updates. In effect, the meta-learning problem treats entire tasks as training examples.

Let us consider a model denoted by $f$, that maps the observation $\mathbf{x}$ into the output variable $a$. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.

Let us consider a generic notion of task as below. Each task $\mathcal{T} = \{\mathcal{L}(\mathbf{x}_1.a_1,\mathbf{x}_2,a_2,..., \mathbf{x}_H,a_H), q(\mathbf{x}_1),q(\mathbf{x}_{t+1}|(\mathbf{x}_t),H \}$, consists of a loss function $\mathcal{L}$, a distribution over initial observations $q(\mathbf{x}_1)$, a transition distribution $q(xt+1jxt; at)$, and an episode lengthH. In i.i.d. supervised learning problems, the length H =1. The model may generate samples of length H by choosing an output at at each time t.

Different Types of MAML

Experiments

Conclusion & Critique

References

  1. Schmidhuber, J¨urgen. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 1992.