Collaboration between MSRA and Yonsei University generates computer vision breakthrough
By Miran Lee, Senior Principal Research Program Manager, Microsoft Research Asia
Four researchers. Two institutions. One successful computer vision breakthrough. That’s the power of collaboration.
The project, “Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields,” presented at the 2016 European Conference on Computer Vision (ECCV), will help AI systems better predict what an image represents. This collaboration between Yonsei University in Korea and Microsoft Research Asia (MSRA) in Beijing, from Sept. 2015 to Sept. 2016, resulted in one of the top 5 percent of papers submitted to this prestigious conference.
The project team included Seungryong Kim and Kihong Park, both PhD candidates at Yonsei University, Dr. Kwanghoon Sohn, a professor of electrical and electronic engineering at Yonsei University, and Dr. Steve Lin, a senior researcher at MSRA. Kim also participated as a research intern with MSRA, supervised by Lin.
The goal of the project was to create an algorithm that jointly predicts depth and intrinsic images from a single image, necessary for computer vision applications, such as identifying an object. The team found that solving for both depth and intrinsic image decomposition at the same time outperforms previous solutions that solved them sequentially.
How the solution works
For computers to understand real-world imagery in context, they have to understand depth, shading and reflectance of each scene surface. Depth prediction and intrinsic image decomposition can recover depth, shading and reflectance, and this project created a way for each task to assist in solving the other.
Specifically, the project used convolutional neural networks (CNNs) to solve for the individual problems of single-image depth prediction and intrinsic image decomposition. It solved for the two tasks synergistically in a joint conditional random field (CRF) that uses a novel CNN architecture, called the joint convolutional neural field (JCNF) model.
The architecture of this model allowed networks for each task to share convolutional activations and layers, and apply machine learning in the image gradient domain. Within this system, depth, shading and reflectance are predicted in a manner that yields more globally consistent results.
Lin noted that the Microsoft Fellowship program, which places PhD interns at MSRA, (similar to a Microsoft Research PhD Fellowship program offered in the U.S.) had benefits to both Microsoft and the intern. Said Lin, “This collaboration provided me with the opportunity to work with top PhD students and faculty from Korea, while giving Seungryong a chance to experience the stimulating research environment of MSRA. The opportunity to present our research helped his growth as a researcher, and Microsoft gets to introduce itself to the top-of-class researchers at leading universities. The relationship has turned into a lasting collaboration that continues to this day.”