On The Convergence Of ADAM And Beyond

From statwiki
Revision as of 23:52, 10 March 2018 by Mbbrough (talk | contribs)
Jump to navigation Jump to search

Introduction

Somewhat different to the presentation I gave in class, this paper focuses strictly on the pitfalls in convergance of the ADAM training algorithm for neural networks from a theoretical standpoint and proposes a novel improvement to ADAM called AMSGrad. The result essentially introduces the idea that it is possible for ADAM to get itself "stuck" in it's weighted average history which must be prevented somehow.

Notation

The paper presents the following framework that generalizes training algorithms to allow us to define a specific variant such as AMSGrad or SGD entirely within it:




[math]\displaystyle{ \beta }[/math]