Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
- Marah I Abdin ,
- Sam Ade Jacobs ,
- Ammar Ahmad Awan ,
- Jyoti Aneja ,
- Ahmed Awadallah ,
- Hany Hassan Awadalla ,
- Nguyen Bach ,
- Amit Bahree ,
- Arash Bakhtiari ,
- Harkirat Behl ,
- Alon Benhaim ,
- Misha Bilenko ,
- Johan Bjorck ,
- Sébastien Bubeck ,
- Martin Cai ,
- Caio César Teodoro Mendes ,
- Weizhu Chen ,
- Vishrav Chaudhary ,
- Parul Chopra ,
- Allie Del Giorno ,
- Gustavo de Rosa ,
- Matthew Dixon ,
- Ronen Eldan ,
- Dan Iter ,
- Abhishek Goswami ,
- Suriya Gunasekar ,
- Emman Haider ,
- Junheng Hao ,
- Russell J. Hewett ,
- Jamie Huynh ,
- Mojan Javaheripi ,
- Xin Jin ,
- Piero Kauffmann ,
- Nikos Karampatziakis ,
- Dongwoo Kim ,
- Mahmoud Khademi ,
- Lev Kurilenko ,
- James R. Lee ,
- Yin Tat Lee ,
- Yuanzhi Li ,
- Chen Liang ,
- Weishung Liu ,
- Xihui (Eric) Lin ,
- Zeqi Lin ,
- Piyush Madan ,
- Arindam Mitra ,
- Hardik Modi ,
- Anh Nguyen ,
- Brandon Norick ,
- Barun Patra ,
- Daniel Perez-Becker ,
- Thomas Portet ,
- Reid Pryzant ,
- Heyang Qin ,
- Marko Radmilac ,
- Corby Rosset ,
- Sambudha Roy ,
- Olli Saarikivi ,
- Amin Saied ,
- Adil Salim ,
- Michael Santacroce ,
- Shital Shah ,
- Ning Shang ,
- Hiteshi Sharma ,
- Xia Song ,
- Olatunji Ruwase ,
- Xin Wang ,
- Rachel Ward ,
- Guanhua Wang ,
- Philipp Witte ,
- Michael Wyatt ,
- Can Xu ,
- Jiahang Xu ,
- Weijian Xu ,
- Sonali Yadav ,
- Fan Yang ,
- Ziyi Yang ,
- Donghan Yu ,
- Chengruidong Zhang ,
- Cyril Zhang ,
- Jianwen Zhang ,
- Li Lyna Zhang ,
- Yi Zhang ,
- Yunan Zhang ,
- Xiren Zhou
MSR-TR-2024-12 |
Published by Microsoft
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench).
Publication Downloads
Phi-3
April 23, 2024
The Phi-3-Mini-128K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
Research Forum 4 | Keynote: Phi-3-Vision: A highly capable and “small” language vision model
Jianfeng Gao, Distinguished Scientist and Vice President in Microsoft Research Redmond, introduces Phi-3-Vision, an advanced and economical open-source multimodal model. As a member of the Phi-3 model family, Phi-3-Vision enhances language models by integrating multi-sensory skills, seamlessly combining language and vision capabilities.