MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras
- Jinrui Zhang ,
- Huan Yang ,
- Ju Ren ,
- Deyu Zhang ,
- Bangwen He ,
- Yuanchun Li ,
- Ting Cao ,
- Yaoxue Zhang ,
- Yunxin Liu
The 28th Annual International Conference On Mobile Computing And Networking (MobiCom'22) |
Published by ACM
Real-time depth estimation is critical for the increasingly popular augmented reality and virtual reality applications on mobile devices. Yet existing solutions are insufficient as they require expensive depth sensors or motion of the device, or have a high latency. We propose MobiDepth, a real-time depth estimation system using the widely-available on-device dual cameras. While binocular depth
estimation is a mature technique, it is challenging to realize the technique on commodity mobile devices due to the different focal lengths and unsynchronized frame flows of the on-device dual cameras and the heavy stereo-matching algorithm.
To address the challenges, MobiDepth integrates three novel techniques: 1) iterative field-of-view cropping, which crops the field-of-views of the dual cameras to achieve the equivalent focal lengths for accurate epipolar rectification; 2) heterogeneous camera synchronization, which synchronizes the frame flows captured by the dual cameras to avoid the displacement of moving objects across the frames in the same pair; 3) mobile GPU-friendly stereo matching, which effectively reduces the latency of stereo matching on a mobile GPU. We implement MobiDepth on multiple commodity mobile devices and conduct comprehensive evaluations. Experimental results show that MobiDepth achieves real-time depth estimation of 22 frames per second with a significantly reduced depth-estimation error compared with the baselines. Using MobiDepth, we further build an example application of 3D pose estimation, which significantly outperforms the state-of-the-art 3D pose-estimation method, reducing the pose-estimation latency and error by up to 57.1% and 29.5%, respectively.