Three Explorations on Pre-Training: an Analysis, an Approach, and an Architecture
In this talk, I am going to cover three of our recent explorations on pre-training. First is an analysis on object/attribute detection pre-training, which produces bottom-attention features extensively used in vision and language research. The main finding is that plain grid features can work equally well without object proposals, while being significantly faster. Next is an approach for self-supervised visual representation learning. The main message is that a simple Siamese network can learn competitive representations, without commonly believed essential components such as contrastive pairs, or momentum encoders. Last is an architecture extension of major frameworks in self-supervised learning from convolutional networks to transformers. We find vision transformers can work out-of-box, subject to instability issues which we call out for awareness.
发言人详细信息
Xinlei Chen is a research scientist working at Facebook AI Research since 2018. He obtained a Ph.D. from the school of computer science at Carnegie Mellon University, and before that obtained a Batchelor degree from Zhejiang University, China. He is mainly interested in computer vision, machine learning and natural language processing, and particularly in pre-training recently. He is a recipient of CVPR 2021 best paper honorable mentions, and ICML 2021 outstanding paper honorable mentions award for works in self-supervised learning.
- 日期:
- 演讲者:
- Xinlei Chen
- 所属机构:
- Facebook AI Research
-
-
Chunyuan Li
Principal Researcher
-
Jianwei Yang
Principal Researcher
-
Pengchuan Zhang
Senior Researcher
-
Zhe Gan
Principal Researcher
-
-