On The Convergence Of ADAM And Beyond: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "= Introduction = Somewhat different to the presentation I gave in class, this paper focuses strictly on the pitfalls in convergance of the ADAM training algorithm for neural n...")
 
No edit summary
Line 4: Line 4:
== Notation ==
== Notation ==
The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:
The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:
[[File:Example.jpg]]
 
 
[[File:training_algo_framework.png]]





Revision as of 23:52, 10 March 2018

Introduction

Somewhat different to the presentation I gave in class, this paper focuses strictly on the pitfalls in convergance of the ADAM training algorithm for neural networks from a theoretical standpoint and proposes a novel improvement to ADAM called AMSGrad. The result essentially introduces the idea that it is possible for ADAM to get itself "stuck" in it's weighted average history which must be prevented somehow.

Notation

The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:




[math]\displaystyle{ \beta }[/math]