Training dynamics of AdamApply
One of the most used algorithm to train deep neural networks is Adam. However, despite its empirical success, it is a poorly understood algorithm. In particular, existing mathematical theories fail to capture a quantifiable advantage over the classic stochastic gradient descent. In this project, we will take a different route: Instead of studying Adam as a black-box under simplified assumptions, we will carefully analyze its empirical training dynamics, in particular in the first iterations. We aim at pinpointing the key differences between the training dynamics of Adam and the ones of stochastic gradient descent with momentum. Later, using the gathered knowledge, we will formulate a mathematical model of its behavior.
Program - Computer Science
Division - Computer, Electrical and Mathematical Sciences and Engineering
Faculty Lab Link - https://sites.google.com/view/optimal-lab/home
Field of Study - Computer Science, Mathematics or a related discipline
Desired Project Deliverables
Original research – contribution to a research paper