Introduction

Somewhat different to the presentation I gave in class, this paper focuses strictly on the pitfalls in convergance of the ADAM training algorithm for neural networks from a theoretical standpoint and proposes a novel improvement to ADAM called AMSGrad. The result essentially introduces the idea that it is possible for ADAM to get itself "stuck" in it's weighted average history which must be prevented somehow.

Notation

The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:

[math]\displaystyle{ \beta }[/math]

On The Convergence Of ADAM And Beyond

Introduction

Notation

Navigation menu

On The Convergence Of ADAM And Beyond

Introduction

Notation

Navigation menu

Search