Microsoft Research blog
LLM profiling guides KV cache optimization
| Liyuan Liu and Jianfeng Gao
LLMs rely on memory-intensive mechanisms like the key-value (KV) cache to store and quickly retrieve data. FastGen optimizes KV cache usage, reducing LLM memory demands by up to 50% while maintaining performance.
LoftQ: Reimagining LLM fine-tuning with smarter initialization
| Nikos Karampatziakis, Chen Liang, Weizhu Chen, Yixiao Li, Yifan Yu, and Tuo Zhao
LoftQ boosts LLM efficiency by streamlining the fine-tuning process, reducing computational demands while preserving high performance. Innovations like this can help make AI technology more energy-efficient.
Abstracts: May 6, 2024
| Michel Galley and Gretchen Huizinga
Researcher Michel Galley explores how he and fellow researchers combined new and existing data to create MathVista, an open-source benchmark for measuring the mathematical reasoning capabilities of foundation models in scenarios that involve text and images.