Three Explorations on Pre-Training: an Analysis, an Approach, and an Architecture

In this talk, I am going to cover three of our recent explorations on pre-training. First is an analysis on object/attribute detection pre-training, which produces bottom-attention features extensively used in vision and language research. The main finding is that plain grid features can work equally well without object proposals, while being significantly faster. Next is an approach for self-supervised visual representation learning. The main message is that a simple Siamese network can learn competitive representations, without commonly believed essential components such as contrastive pairs, or momentum encoders. Last is an architecture extension of major frameworks in self-supervised learning from convolutional networks to transformers. We find vision transformers can work out-of-box, subject to instability issues which we call out for awareness.

View slides

发言人详细信息

Xinlei Chen is a research scientist working at Facebook AI Research since 2018. He obtained a Ph.D. from the school of computer science at Carnegie Mellon University, and before that obtained a Batchelor degree from Zhejiang University, China. He is mainly interested in computer vision, machine learning and natural language processing, and particularly in pre-training recently. He is a recipient of CVPR 2021 best paper honorable mentions, and ICML 2021 outstanding paper honorable mentions award for works in self-supervised learning.

专题：: Microsoft Vision+Language Summer Talk Series
日期：: 2021年9月10日
演讲者：: Xinlei Chen
所属机构：: Facebook AI Research

- Chunyuan Li
  
  Principal Researcher
- Jianwei Yang
  
  Principal Researcher
- Pengchuan Zhang
  
  Senior Researcher
- Zhe Gan
  
  Principal Researcher
研究领域
- Artificial intelligence
研究院
- Microsoft Research Lab - Redmond
组
- Deep Learning Group