Closing the gap between the upper bound and lower bound of Adam’s iteration complexity
- Bohan Wang ,
- Jingwen Fu ,
- Huishuai Zhang ,
- Nanning Zheng ,
- Wei Chen
Recently, Arjevani et al. [1] establish a lower bound of iteration complexity for the first-order optimization under an \(L\)-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam’s convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an \(L\)-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam’s convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.