The goal of neural architecture search (NAS) (opens in new tab) is to have computers automatically search for the best-performing neural networks. Recent advances in NAS methods have made it possible to build problem-specific networks that are faster, more compact, and less power hungry than their handcrafted counterparts. Unfortunately, many NAS methods rely on an array of tricks that aren’t always documented in a way that’s easy to discover. While these tricks result in neural networks with greater accuracy, they often cloud the performance of the search algorithm themselves. Since different NAS methods use different enhancements and some none at all, NAS techniques have become difficult for researchers to compare. The use of a variety of enhancements has also made NAS methods difficult to reproduce (opens in new tab). Once-promising methods may disappoint when an attempt is made to transfer them to other datasets. Additionally, engineers trying to use NAS often find it challenging to understand the implications of advertised advances because of a deluge of research claims, an inability to fairly compare methods side by side, fragmented code bases in research repos, hyperparameters that aren’t carefully managed, and a lack of plug-and-play for individual techniques.
- SOURCE CODE GitHub: Archai
We’ve sought to address many of these concerns with a goal of making state-of-the-art NAS research more widely usable. We’ve asked, can we find the right abstractions to unify many of these methods? A unified NAS framework would help enable the adoption of NAS algorithms in industry and support reproducibility, as well as fair evaluation, in research. Such a framework would also accelerate algorithmic innovation by allowing the research community to pursue even higher ambitions in its application of NAS, as well as to conduct searches in novel spaces that might yield architectures we haven’t yet imagined. With this goal in mind, we’ve developed Archai, an-open source project now available on GitHub. Archai, short for Architecture AI, means “first principles,” which captures the spirit of the work we’re doing.
Archai enables execution of standard NAS algorithms with a single command line. Currently, Differentiable Architecture Search (DARTS) (opens in new tab), Petridish (opens in new tab), Differentiable ArchiTecture Approximation (DATA) (opens in new tab), and eXperts Neural Architecture Search (XNAS) (opens in new tab) are implemented. Archai makes it easy to add new algorithms, experiment with many well-known datasets, and add new datasets through unified interfaces. Additionally, Archai enables the isolation of hyperparameters via a configuration system that makes assumptions and settings explicit. The behaviors of architecture search systems are sensitive to these hyperparameters. With unified hyperparameter configuration controls, different algorithms can be tested on the same playing field.
Microsoft research podcast
At the core of Archai are several interfaces that provide abstractions for common components of NAS algorithms. This reduces the code duplication, making new algorithm development faster and easier. Archai also uses a common model description language based on YAML that is extensible and “compilable” to a PyTorch model. Because all the algorithms share exactly the same components, including the ones for training and evaluation, they can be written more compactly. Having common components also sets the stage for fairer comparison and easier reproducibility.
Key features of Archai
- Declarative approach and reproducibility: Many research works employ a variety of enhancements that, while seemingly small, could make a world of difference to neural network performance. For example, some works use only 600 epochs for final architecture training, while others use 1,500. Some may exploit AutoAugment (opens in new tab) for data augmentation during training, while others may only use Cutout (opens in new tab). We pored over various research codebases to extract bags of tricks. With Archai, these tricks can now be switched on or off by simple configuration that applies to all algorithms. Extracting these tricks has also allowed us to make Archai a general-purpose framework to train manually designed neural networks efficiently. Recent work (opens in new tab) has shown that judiciously using these training tricks is usually more important than small differences in architectures themselves.
- Search-space abstractions: A significant amount of current NAS research focuses on rather small search spaces made popular by a few early efforts. In Archai, we offer abstractions that can significantly expand the search spaces in a more generalized fashion and are available to all algorithms. It’s our hope that the research community will find it useful to push the envelope with these expanded search spaces that haven’t been fully explored yet.
- Mixing and matching of different techniques: There are several exciting questions we can explore through mixing and matching different techniques. What if we want to apply the growth method proposed by Petridish to DARTS? Can we apply L1 regularization over architecture weights to other algorithms with just a flip of a configuration switch? What if a researcher wanted to run the online-learning motivated update rules as proposed in XNAS or Geometric NAS (opens in new tab) in new search spaces or use them inside a new algorithm? Archai offers modularized components of different NAS algorithms so they can be easily mixed and matched.
- Generalized Pareto frontier search: A crucial use case in which NAS becomes a necessity is deploying neural networks on compute-constrained platforms such as smartphones or embedded devices. In these scenarios, one can expect budget constraints for power consumption, latency, memory usage, available flops, and other factors. A model must work within this budget even if it means sacrificing some accuracy. It’s difficult to manually design optimal networks with a wide range of specified constraints. Given the difficulty, current NAS algorithms will almost always outperform manual designs. Archai can generate a gallery of architectures with specified compute characteristics. Our NAS method, Petridish, was designed with this primary intention. Petridish is available through Archai, now with higher-performing, distributed implementation. We plan to generalize the Pareto front generation for all algorithms so that almost any algorithm can leverage this technique to produce a similar gallery of models.
Archai offers several other desirable features, including logging, publication-ready experiment reports, mixed-precision training, distributed training, faster and more general implementation of algorithms such as bilevel optimization, training and evaluation code incorporating several best practices, support for NVIDIA Data Loading Library (DALI) and Apex, “mini-mode” for development on a laptop, support for TensorWatch (opens in new tab), as well as TensorBoard (opens in new tab), architecture visualization, NASBench-101 (opens in new tab)/201 (opens in new tab) (with NASBench-301 (opens in new tab) coming soon), and cross-platform code that works for Linux and OS X, as well as Windows.
For full list of features, please visit our GitHub page (opens in new tab).
Join the Archai community
We hope that researchers and engineers will find Archai useful and contribute to these efforts to accelerate NAS adoption, as well as future research. We formally invite the broader community to join us in this journey with contributions, pull requests, and algorithm implementations. Check out our Archai GitHub repository (opens in new tab) for more information and join the Archai group (opens in new tab) to stay up to date.
Acknowledgments: We thank Partner Research Manager John Langford, Technical Fellow and Chief Scientific Officer Eric Horvitz, Senior Principal Researcher Rich Caruana, and Principal Research Manager Alekh Agarwal for providing valuable guidance and rich discussions.