Computer Vision

Project

Compositional 3D-aware Video Generation

In a Magician’s magical cabin alone in a serene forest, an alien walking on the floor, starting from the cabin’s door to the mow near the bottom right corner of this image. Four characters stood…

Career Opportunity

Researcher – AI technology innovation

Posted: August 21, 2024

Location: Beijing, China; Shanghai, China

Research Area(s): Algorithms, Artificial intelligence, Computer vision, Data platforms and analytics

As Researcher, you will conduct research and lead research collaborations that yield new insights, theories, analyses, data, algorithms, and/or prototypes that advance the state-of-the-art of computer science and engineering, as well as general scientific knowledge,…

Publication

Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology

Eric Zimmermann, Eugene Vorontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, Jimmy Hall, Thomas J. Fuchs, Nicolo Fusi, Siqi Liu, Kristen Severson

July 2024

Project

Video In-Context Learning

Driving large vision models with video demonstrations In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image guided…

Publication

Video In-context Learning

Wentao Zhang, Junliang Guo, Tianyu He, Li Zhao, Linli Xu, Jiang Bian

July 2024

Project

Publication

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

ECCV 2024 | July 2024

Publication

Scribble: Auto-Generated 2D Avatars with Diverse and Inclusive Art-Direction

Lohit Petikam, Charlie Hewitt, Shideh Rezaeifar

SIGGRAPH | July 2024

Publication

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Minghan Li, Heng Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann

NeurIPS 2024 | June 2024

Project

A Dynamic Benchmark for Image Understanding

We have created a procedurally generatable, synthetic dataset for testing spatial reasoning, visual prompting, object recognition and detection. A key question for understanding multimodal model performance is how well is can understand images, in particular…

Publication

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

CVPR 2024 | June 2024

Microsoft at CVPR 2024: Innovations in computer vision and AI research

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Exploring how context, culture, and character matter in avatar research

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Compositional 3D-aware Video Generation

Researcher – AI technology innovation

Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology

Video In-Context Learning

Video In-context Learning

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Scribble: Auto-Generated 2D Avatars with Diverse and Inclusive Art-Direction

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

A Dynamic Benchmark for Image Understanding

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Computer Vision

Highlights