Stochastic Gradient Descent Algorithm in the Computational Network Toolkit
- Brian Guenter ,
- Dong Yu ,
- Adam Eversole ,
- Oleksii Kuchaiev ,
- Mike Seltzer
OPT2013: NIPS Workshop on Optimization for Machine Learning |
We introduce the stochastic gradient descent algorithm used in the computational network toolkit (CNTK) — a general purpose machine learning toolkit written in C++ for training and using models that can be expressed as a computational network. We describe the algorithm used to compute the gradients automatically for a given network. We also propose a low-cost automatic learning rate selection algorithm and demonstrate that it works well in practice.