Gradient Episodic Memory for Continual Learning: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 24: Line 24:
</math>
</math>
</center>
</center>
The main mathematical purpose of continual learning is to obtain <math>f: \mathcal{X} \times \mathcal{Y} </math>where a target vector <math>y</math> must be inquired using a test pair <math>(x,t)</math>.
The main mathematical purpose of continual learning is to obtain <math>f: \mathcal{X} \times \mathcal{Y} </math>where a target vector <math>y</math> must be inquired using a test pair <math>(x,t)</math>.
== Task Descriptor ==
== Task Descriptor ==

Revision as of 23:19, 17 November 2018

Presented by

  • Yu Xuan Lee
  • Tsen Yee Heng

Background and Introduction

Supervised learning consist of a training set [math]\displaystyle{ D_{tx}=(x_i,y_i)^n_{i=1} }[/math], where [math]\displaystyle{ x_i \in \mathcal{X} }[/math] and [math]\displaystyle{ y_i \in \mathcal{Y} }[/math]. Empirical Risk Minimization (ERM) is one of the common supervised learning method used to minimize a loss function by having multiple passes over the training set.

[math]\displaystyle{ \frac{1}{|D_{tr}|}\textstyle \sum_{(x_i,y_i) \in D_{tr}} \ell (f(x_i),y_i) }[/math]


where [math]\displaystyle{ \ell :\mathcal {Y} \times \mathcal {Y} \to [0, \infty) }[/math]

Different to machine learning, datas are being observed sequentially, occurred recurrently, and stored limitedly for learning humans. Thus, the iid assumption is not applicable to ERM. One of the characteristics of ERM is "catastrophic forgetting", which is the problem of recalling past knowledge upon acquiring new ones. To overcome this problem, Gradient Episodic Memory (GEM) is introduced to alleviates forgetting on previous acquired knowledge, while solving new problems more efficiently.

Framework for Continual Learning

The feature vector [math]\displaystyle{ x_i \in \mathcal{X}_t }[/math], task descriptor [math]\displaystyle{ t_i \in \mathcal{T} }[/math], and target vector [math]\displaystyle{ y_i \in \mathcal{Y}_t }[/math] are the three main components of a continuum of data. Note that the continuum is locally iid where for every [math]\displaystyle{ (x_i, t_i, y_i) }[/math]

[math]\displaystyle{ (x_i,y_i) \overset{iid}{\sim} P_{t_i}(X,Y) }[/math]

The main mathematical purpose of continual learning is to obtain [math]\displaystyle{ f: \mathcal{X} \times \mathcal{Y} }[/math]where a target vector [math]\displaystyle{ y }[/math] must be inquired using a test pair [math]\displaystyle{ (x,t) }[/math].

Task Descriptor

Task descriptor are integers [math]\displaystyle{ t_i=i \in \mathbb{Z} }[/math] which occurs in a collection where [math]\displaystyle{ t_1,...,t_n \in \mathcal{T} }[/math]. [math]\displaystyle{ t_i }[/math] could also possibly be a structured object which consist of descriptions on solving [math]\displaystyle{ i }[/math]-th task. For having more information in [math]\displaystyle{ t }[/math], zero-shot learning could be achieved because relation between task could be detected using new task descriptor.

Task descriptors are structured objects, describing how to solve tasks. They are integers [math]\displaystyle{ t_i=i \in \mathbb{Z} }[/math] which occurs in a collection where [math]\displaystyle{ t_1,...,t_n \in \mathcal{T} }[/math].

Training Protocol