News & features
In the news | Venture Beat
Microsoft’s Differential Transformer cancels attention noise in LLMs
Improving LLMs’ ability to retrieve in-prompt information can impact important applications such as retrieval-augmented generation (RAG) and in-context learning (ICL). Microsoft Research and Tsinghua University researchers have introduced Differential Transformer (Diff Transformer), a new LLM architecture that amplifies attention to relevant context while…
In the news | IEEE Spectrum
1-bit LLMs Could Solve AI’s Energy Demands
“Imprecise” language models are smaller, speedier—and nearly as accurate. Large language models, the AI systems that power chatbots like ChatGPT, are getting better and better—but they’re also getting bigger and bigger, demanding more energy and computational power. For LLMs that…
Research Focus: Week of March 4, 2024
In this issue: Generative kaleidoscopic networks; Text diffusion with reinforced conditioning; PRISE – Learning temporal action abstractions as a sequence compression problem.
Research Focus: Week of January 8, 2024
| Zinan Lin, Jinyu Li, Bhaskar Mitra, Siân Lindley, Liang Wang, Nan Yang, and Furu Wei
Mixture-of-linear-experts for long-term time series forecasting; Weakly-supervised streaming multilingual speech model with truly zero-shot capability; KBFormer: Diffusion model for structured entity completion; Identifying risks of AI-mediated data access: