微软研究院博客
| Yale Song
Understanding video is one of the most challenging problems in AI, and an important underlying requirement is learning multimodal representations that capture information about objects, actions, sounds, and their long-range statistical dependencies from audio-visual signals. Recently, transformers have been successful in…
| Alex Polozov, Chris Meek, 和 Ahmed Awadallah
One key aspiration of AI is to develop natural and effective task-oriented conversational systems. Task-oriented conversational systems use a natural language interface to collaborate with and support people in accomplishing specific goals and activities. They go beyond chitchat conversation. For…
| Misha Khodak, Neil Tenenholtz, Lester Mackey, 和 Nicolo Fusi
From BiT (928 million parameters) to GPT-3 (175 billion parameters), state-of-the-art machine learning models are rapidly growing in size. With the greater expressivity and easier trainability of these models come skyrocketing training costs, deployment difficulties, and even climate impact. As…