Self-Tuning Networks: Amortizing the Hypergradient Computation for Hyperparameter Optimization

Optimization of many deep learning hyperparameters can be formulated as a bilevel optimization problem. While most black-box and gradient-based approaches require many independent training runs, we aim to adapt hyperparameters online as the network trains. The main challenge is to approximate the response Jacobian, which captures how the minimum of the inner objective changes as the hyperparameters are perturbed. To do this, we introduce the self-tuning network (STN), which fits a hypernetwork to approximate the best response function in the vicinity of the current hyperparameters. Differentiating through the hypernetwork lets us efficiently approximate the gradient of the validation loss with respect to the hyperparameters. We train the hypernetwork and hyperparameters jointly. Empirically, we can find hyperparameter settings competitive with Bayesian Optimization in a single run of training, and in some cases find hyperparameter schedules that outperform any fixed hyperparameter value.

Learn more about the 2020-2021 Directions in ML: AutoML and Automating Algorithms virtual speaker series >

发言人详细信息

Roger Grosse is an Assistant Professor of Computer Science at the University of Toronto, and a founding member of the Vector Institute for Artificial Intelligence. He received his Ph.D. in computer science from MIT, and then spent two years as a postdoc at the University of Toronto. He holds a Canada Research Chair in Probabilistic Inference and Deep Learning, an Ontario MRIS Early Researcher Award, and a CIFAR Canadian AI Chair.

日期：: 2021年3月31日
演讲者：: Roger Grosse
所属机构：: University of Toronto

研究领域
- Artificial intelligence
研究院
- Microsoft Research Lab - New England
项目
- AutoML
活动
- Directions in ML: AutoML and Automating Algorithms

接下来观看

Combining Machine Learning and Bayesian networks for Decision Support in Arrythmia Diagnosis
March 20, 2024
Speakers:

Tezira Wanyana

Self-Tuning Networks: Amortizing the Hypergradient Computation for Hyperparameter Optimization

发言人详细信息

相关链接

研究领域

研究院

项目

活动

接下来观看

Combining Machine Learning and Bayesian networks for Decision Support in Arrythmia Diagnosis