Microsoft Research blog
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model
| Ali Alvi and Paresh Kharya
We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration…
DeepSpeed powers 8x larger MoE model training with high performance
| DeepSpeed Team and Z-code Team
Today, we are proud to announce DeepSpeed MoE, a high-performance system that supports massive scale mixture of experts (MoE) models as part of the DeepSpeed (opens in new tab) optimization library. MoE models are an emerging class of sparsely activated…
DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression
| DeepSpeed Team, Rangan Majumder, and Andrey Proskurin
Last month, the DeepSpeed Team announced ZeRO-Infinity, a step forward in training models with tens of trillions of parameters. In addition to creating optimizations for scale, our team strives to introduce features that also improve speed, cost, and usability. As…
DeepSpeed: Extreme-scale model training for everyone
| DeepSpeed Team, Rangan Majumder, and Junhua Wang
In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has…
Research Collection: Tools and Data to Advance the State of the Art
“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing” —Sam Madden, Professor, Massachusetts Institute of Technology An open…
ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale
| DeepSpeed Team, Rangan Majumder, and Junhua Wang
In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has…
Turing-NLG: A 17-billion-parameter language model by Microsoft
| Corby Rosset
This figure was adapted from a similar image published in DistilBERT. Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a…
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters
| DeepSpeed Team, Rangan Majumder, and Junhua Wang
The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly…