DeepSpeed

Extreme Speed and Scale for DL Training and Inference

微软研究院博客

微软研究院博客

Research Focus: Week of October 28, 2024

2024年11月1日

New Research | FLASH: Workflow automation agent for diagnosing recurring incidents; METAREFLECTION: Learning instructions for language agents using past reflections; Boosting LLM training efficiency through faster communication between GPUs; and more.

微软研究院博客

Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries

2023年12月22日

AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage for further progress in 2024.

DeepSpeed4Science Initiative - graphic with 6 icons

微软研究院博客

Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies

2023年9月19日 | Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Martin Cai, 和 Yuxiong He

Editor’s note, Sept. 28, 2023 – The founding collaborators list was updated to correct omissions and the scientific foundation model graph was updated to correct information. In the next decade, deep learning may revolutionize the natural sciences, enhancing our capacity to…

微软研究院博客

DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication

2023年6月22日 | DeepSpeed Team 和 Andrey Proskurin

Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like…

Microsoft Research Focus 03: Week of November 7th, 2022

微软研究院博客

Research Focus: Week of November 7, 2022

2022年11月8日

Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei,…

微软研究院博客

DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization

2022年7月20日 | DeepSpeed Team 和 Andrey Proskurin

Large-scale models are revolutionizing deep learning and AI research, driving major improvements in language understanding, generating creative texts, multi-lingual translation and many more. But despite their remarkable capabilities, the models’ large size creates latency and cost constraints that hinder the…

微软研究院博客

DeepSpeed: Advancing MoE inference and training to power next-generation AI scale

2022年1月19日 | DeepSpeed Team 和 Andrey Proskurin

In the last three years, the largest trained dense models have increased in size by over 1,000 times, from a few hundred million parameters to over 500 billion parameters in Megatron-Turing NLG 530B (MT-NLG). Improvements in model quality with size…

微软研究院博客

Research at Microsoft 2021: Collaborating for real-world change

2021年12月15日

Over the past 30 years, Microsoft Research has undergone a shift in how it approaches innovation, broadening its mission to include not only advancing the state of computing but also using technology to tackle some of the world’s most pressing…

A line graph comparing the end-to-end performance of Meta’s MoE language model using Azure NDm A100 v4 VMs with and without Tutel. The x-axis is the number of A100 (80GB) GPUs, beginning at 8 and going up to 512, and the y-axis is the throughput (K tokens/s), beginning with 0 and going up to 1,000 in intervals of 100. Tutel always achieves higher throughput than fairseq.

微软研究院博客

Tutel: An efficient mixture-of-experts implementation for large DNN model training

2021年11月22日 | Wei Cui, Yifan Xiong, Peng Cheng, 和 Rafael Salas

Mixture of experts (MoE) is a deep learning model architecture in which computational cost is sublinear to the number of parameters, making scaling easier. Nowadays, MoE is the only approach demonstrated to scale deep learning models to trillion-plus parameters, paving…