Neural ODEs
Introduction
Reverse-mode Automatic Differentiation of ODE Solutions
Replacing Residual Networks with ODEs for Supervised Learning
Continuous Normalizing Flows
A Generative Latent Function Time-Series Model
Scope and Limitations
Section 6 mainly discusses the scope and limitations of the paper. Firstly while “batching” the training data is a useful step in standard neural nets, and can still be applied here by combining the ODEs associated with each batch, the authors found that that controlling error in this case may increase the number of calculations required. In practice however the number of calculations did not increase significantly.
So long as the model proposed in this paper uses finite weights and Lipschitz nonlinearities, then Picard’s existence theorem (Coddington and Levinson, 1955) applies, guaranteeing the solution to the IVP exists and is unique.
In controlling the amount of error in the model, the authors were only able to reduce tolerances to approximately 10^-3 and 10^-5 in classification and density estimation respectively without also degrading the computational performance.
The authors believe that reconstructing state trajectories by running the dynamics backwards can introduce extra numerical error. They address a possible solution to this problem by checkpointing certain time steps and storing intermediate values of z on the forward pass. Then while reconstructing, you do each part individually between checkpoints. The authors acknowledged that they informally checked the validity of this method since they don’t consider it a practical problem.
Conclusions and Critiques
Link to Appendices of Paper
https://arxiv.org/pdf/1806.07366.pdf