Microsoft Research Blog
Microsoft and NVIDIA introduce parameter-efficient multimodal transformers for video representation learning
| Yale Song
Understanding video is one of the most challenging problems in AI, and an important underlying requirement is learning multimodal representations that capture information about objects, actions, sounds, and their long-range statistical dependencies from audio-visual signals. Recently, transformers have been successful in…
Conversations with data: Advancing the state of the art in language-driven data exploration
| Alex Polozov, Chris Meek, et Ahmed Awadallah
One key aspiration of AI is to develop natural and effective task-oriented conversational systems. Task-oriented conversational systems use a natural language interface to collaborate with and support people in accomplishing specific goals and activities. They go beyond chitchat conversation. For…
Factorized layers revisited: Compressing deep networks without playing the lottery
| Misha Khodak, Neil Tenenholtz, Lester Mackey, et Nicolo Fusi
From BiT (928 million parameters) to GPT-3 (175 billion parameters), state-of-the-art machine learning models are rapidly growing in size. With the greater expressivity and easier trainability of these models come skyrocketing training costs, deployment difficulties, and even climate impact. As…