Gradient Episodic Memory for Continual Learning: Difference between revisions
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
== Background and Introduction == | == Background and Introduction == | ||
Supervised learning consist of a training set <math>D_{tx}=(x_i,y_i)^n_{i=1}</math>, where <math>x_i \in X</math> and <math>y_i \in Y</math>. Empirical Risk Minimization (ERM) is one of the common supervised learning method used to minimize a loss function by having multiple passes over the training set. | Supervised learning consist of a training set <math>D_{tx}=(x_i,y_i)^n_{i=1}</math>, where <math>x_i \in X</math> and <math>y_i \in Y</math>. Empirical Risk Minimization (ERM) is one of the common supervised learning method used to minimize a loss function by having multiple passes over the training set. | ||
<center> | <center> | ||
<math> | <math> |
Revision as of 01:03, 17 November 2018
Group Member
Yu Xuan Lee, Tsen Yee Heng
Background and Introduction
Supervised learning consist of a training set [math]\displaystyle{ D_{tx}=(x_i,y_i)^n_{i=1} }[/math], where [math]\displaystyle{ x_i \in X }[/math] and [math]\displaystyle{ y_i \in Y }[/math]. Empirical Risk Minimization (ERM) is one of the common supervised learning method used to minimize a loss function by having multiple passes over the training set.
[math]\displaystyle{ \frac{1}{|D_{tr}|}\textstyle \sum_{x_i,y_i} \in D_{tr} \ell (f(x_i),y_i) }[/math]
where [math]\displaystyle{ \ell :\mathcal {Y} \times \mathcal {Y} \to [0, \infty) }[/math]
However, the downside of ERM is "catastrophic forgetting", which is the problem of recalling past knowledge upon acquiring new ones.
Gradient Episodic Memory (GEM) is a continual learning model that alleviates forgetting on previous acquired knowledge, while solving new problems more efficiently.