Tightly Connecting Vision and Language

Remarkable progress has been made at the intersection of vision and language. While showing great promise, current vision and language models may only weakly “connect” the two modalities and often fail in the wild. In this talk, I will present our recent efforts aiming to bridge this gap along two dimensions: informativeness and controllability. In particular, I will describe how we can leverage large-scale datasets, including our recently-released CC12M and Localized Narratives, to benefit existing vision-and-language tasks as well as to enable new applications.

View slides

发言人详细信息

Soravit (Beer) Changpinyo is a Software Engineer at Google Research. His research interests are in machine learning with applications to computer vision and natural language processing. Prior to joining Google, he was a PhD candidate and an Annenberg Fellow at the University of Southern California, advised by Fei Sha.

专题：: Microsoft Vision+Language Summer Talk Series
日期：: 2021年8月25日
演讲者：: Soravit (Beer) Changpinyo
所属机构：: Google

- Chunyuan Li
  
  Principal Researcher
- Jianwei Yang
  
  Principal Researcher
- Pengchuan Zhang
  
  Senior Researcher
- Zhe Gan
  
  Principal Researcher
研究领域
- Artificial intelligence
研究院
- Microsoft Research Lab - Redmond
组
- Deep Learning Group