DeepSpeed

Extreme Speed and Scale for DL Training and Inference

Microsoft Research blog

Figure 1. Trend of sizes of state-of-the-art NLP models over time

Microsoft Research Blog

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

October 11, 2021 | Ali Alvi and Paresh Kharya

We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration…

Microsoft Research Blog

DeepSpeed powers 8x larger MoE model training with high performance

August 18, 2021 | DeepSpeed Team and Z-code Team

Today, we are proud to announce DeepSpeed MoE, a high-performance system that supports massive scale mixture of experts (MoE) models as part of the DeepSpeed (opens in new tab) optimization library. MoE models are an emerging class of sparsely activated…

Microsoft Research Blog

DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression

May 24, 2021 | DeepSpeed Team, Rangan Majumder, and Andrey Proskurin

Last month, the DeepSpeed Team announced ZeRO-Infinity, a step forward in training models with tens of trillions of parameters. In addition to creating optimizations for scale, our team strives to introduce features that also improve speed, cost, and usability. As…

Microsoft Research Blog

DeepSpeed: Extreme-scale model training for everyone

September 10, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has…

simulated aerial drone navigating course

Microsoft Research Blog

Research Collection: Tools and Data to Advance the State of the Art

May 19, 2020

“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing” —Sam Madden, Professor, Massachusetts Institute of Technology An open…

Microsoft Research Blog

ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale

May 19, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

Microsoft Research Blog

Turing-NLG: A 17-billion-parameter language model by Microsoft

February 13, 2020 | Corby Rosset

This figure was adapted from a similar image published in DistilBERT. Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a…

Microsoft Research Blog

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

February 13, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly…