Revision as of 23:52, 10 March 2018

Introduction

Somewhat different to the presentation I gave in class, this paper focuses strictly on the pitfalls in convergance of the ADAM training algorithm for neural networks from a theoretical standpoint and proposes a novel improvement to ADAM called AMSGrad. The result essentially introduces the idea that it is possible for ADAM to get itself "stuck" in it's weighted average history which must be prevented somehow.

Notation

The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:

[math]\displaystyle{ \beta }[/math]

@@ Line 4: / Line 4: @@
 == Notation ==
 The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:
-[[File:Example.jpg]]
+[[File:training_algo_framework.png]]

On The Convergence Of ADAM And Beyond: Difference between revisions

Revision as of 23:52, 10 March 2018

Introduction

Notation

Navigation menu

On The Convergence Of ADAM And Beyond: Difference between revisions

Revision as of 23:52, 10 March 2018

Introduction

Notation

Navigation menu

Search