微软研究院博客
| Liyuan Liu 和 Jianfeng Gao
LLMs rely on memory-intensive mechanisms like the key-value (KV) cache to store and quickly retrieve data. FastGen optimizes KV cache usage, reducing LLM memory demands by up to 50% while maintaining performance.
| Nikos Karampatziakis, Chen Liang, Weizhu Chen, Yixiao Li, Yifan Yu, 和 Tuo Zhao
LoftQ boosts LLM efficiency by streamlining the fine-tuning process, reducing computational demands while preserving high performance. Innovations like this can help make AI technology more energy-efficient.
Abstracts: May 6, 2024
| Michel Galley 和 Gretchen Huizinga
Researcher Michel Galley explores how he and fellow researchers combined new and existing data to create MathVista, an open-source benchmark for measuring the mathematical reasoning capabilities of foundation models in scenarios that involve text and images.