MindAgent:Emerging Gaming Interaction

We collaborate with X-Box and Mesh team, explored a new gaming infrastructure and designed the dynamic real-time system for human-player and NPCs with GPT-X in the multi-agent platform.

GitHub: MindAgent (opens in new tab)

ArXiv: https://arxiv.org/abs/2309.09971 (opens in new tab)

Demo: MindAgent.mp4 (opens in new tab)

People

Portrait of Hoi Vo

Hoi Vo

TECHNICAL FELLOW

Xbox Emerging Technologies

Portrait of Steven Gong

Steven Gong

Internship

UCLA, MSR

Portrait of Zane Durante

Zane Durante

Internship

Stanford, MSR

Portrait of Yusuke Noda

Yusuke Noda

PRINCIPAL SOFTWARE ENGINEER

Microsoft Gaming-Xbox Team

Portrait of Song-chun Zhu

Song-chun Zhu

Professor

UCLA

Portrait of Demetri Terzopoulos

Demetri Terzopoulos

Chancellor's Professor

UCLA

Portrait of Fei-Fei Li

Fei-Fei Li

Professor

Stanford University

Portrait of Jianfeng Gao

Jianfeng Gao

Distinguished Scientist & Vice President

We are very excited to share the good news. Our project “MindAgent: Emergent Gaming Interaction (opens in new tab)” is public recently. We seek to develop a unified interaction infrastructure and architecture that can jointly: understand large language corpora, visual (image and video) inputs, as well as provide meaningful action-based outputs.  Our model on a broad range of gaming video tasks and show agent action stream efficacy across a range of tasks including interactive agent, visual and natural language understanding. In this work, we propose a novel infrastructure – MindAgent – to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we introduce CuisineWorld, a new gaming scenario and related benchmark that dispatch a multi-agent collaboration efficiency and supervise multiple agents playing the game simultaneously. We conduct comprehensive evaluations with new auto-metric CoS for calculating the collaboration efficiency. Finally, our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CuisineWorld and adapted in existing broader Minecraft gaming domain. By creating a powerful and general-purpose foundation model with visual, language, and action capabilities, we can have great impact across many industries, both within Microsoft and external.

minecraft vr demo – YouTube (opens in new tab)