DeepSpeed

Extreme Speed and Scale for DL Training and Inference

微软研究院博客

Figure 1. Trend of sizes of state-of-the-art NLP models over time

微软研究院博客

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

2021年10月11日 | Ali Alvi 和 Paresh Kharya

We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration…

微软研究院博客

DeepSpeed powers 8x larger MoE model training with high performance

2021年8月18日 | DeepSpeed Team 和 Z-code Team

Today, we are proud to announce DeepSpeed MoE, a high-performance system that supports massive scale mixture of experts (MoE) models as part of the DeepSpeed (opens in new tab) optimization library. MoE models are an emerging class of sparsely activated…

微软研究院博客

DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression

2021年5月24日 | DeepSpeed Team, Rangan Majumder, 和 Andrey Proskurin

Last month, the DeepSpeed Team announced ZeRO-Infinity, a step forward in training models with tens of trillions of parameters. In addition to creating optimizations for scale, our team strives to introduce features that also improve speed, cost, and usability. As…

微软研究院博客

DeepSpeed: Extreme-scale model training for everyone

2020年9月10日 | DeepSpeed Team, Rangan Majumder, 和 Junhua Wang

In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has…

simulated aerial drone navigating course

微软研究院博客

Research Collection: Tools and Data to Advance the State of the Art

2020年5月19日

“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing” —Sam Madden, Professor, Massachusetts Institute of Technology An open…

微软研究院博客

ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale

2020年5月19日 | DeepSpeed Team, Rangan Majumder, 和 Junhua Wang

微软研究院博客

Turing-NLG: A 17-billion-parameter language model by Microsoft

2020年2月13日 | Corby Rosset

This figure was adapted from a similar image published in DistilBERT. Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a…

微软研究院博客

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

2020年2月13日 | DeepSpeed Team, Rangan Majumder, 和 Junhua Wang

The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly…