a tall building lit up at night

Microsoft Research Lab – Asia

TextureFusion: Enabling high-quality texture acquisition for real-time RGB-D scanning

Share this page

Real-time RGB-D 3D scanning has become widely used to progressively scan objects or scenes with a hand-held RGB-D camera, such as Microsoft Kinect. The depth stream from the camera is accumulated to a voxel grid that contains surface distance. The truncated depth values in the grid become a surface representation. The color stream, in general, is accumulated to the voxel to scan color representation in a similar manner that captures depth.

Figure 1. This figure compares per-voxel color representation (a) with conventional texture representation without optimization (b). Compared with global optimization only (c), the proposed method (d), so-called TextureFusion, can achieve high-quality color texture in real-time RGB-D scanning.

However, the current color representation in real-time RGBD scanning remains suboptimal due to the various challenges. First, since the color information is stored in each voxel in the grid, there is an inevitable tradeoff between spatial resolution and time performance. For instance, when we decrease the spatial resolution of the grid for fast performance, we need to scarify the spatial resolution of color information. Figure 1(a) shows an example. Second, the imperfection of the RGB-D camera introduces several artifacts: depth noise, distortion in both depth and color images, and asynchronization between the depth and color frames. They lead to inaccurate estimation of imperfect geometry, color camera pose, and the mismatch between the geometry and color images (Figures 1a and 1b). These challenges could be mitigated by applying a texture mapping method that includes global/local warping of texture to geometry. The traditional texture mapping methods assume that the reconstructed geometry and all input views are known for mesh parameterization that builds a texture atlas. Moreover, the existing texture optimization methods calculate local texture warping to register texture and geometry accurately. However, these methods also require long computational time for optimization and are not suitable for real-time RGB-D scanning.

Professor Min H. Kim at KAIST in South Korea and his collaborators, Yue Dong and Xin Tong of Microsoft Research Asia, and KAIST PhD students, Joo Ho Lee and Hyunho Ha, have been investigating a new algorithmic technique to address the challenges of high-quality texture acquisition in real-time RGB-D scanning. Their pioneering idea is described in a paper entitled “TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning” and was presented as oral at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2020) June 13-19. This paper was nominated as one of the Best Paper Finalists at the conference. In their paper, they propose a progressive texture fusion method, specially designed for real-time RGB-D scanning. See Figure 2.

Figure 2. The researchers enables real-time texture fusion by proposing (a) a tile-based texture data structure, (b) a real-time texture reconstruction framework, and (c) spatially-varying perspective warp for real-time 3D scanning.

The researchers first develop a novel texture-tile voxel grid, where texture tiles are embedded in the structure of the signed distance function (Figure 2a). They then integrate input color views as warped texture tiles into the texture-tile voxel grid. Doing so allows for establishing and updating the relationship between geometry and texture information in real-time without computing a high-resolution texture atlas (Figure 2b). Instead of using expensive mesh parameterization, they associate vertices of implicit geometry directly with texture coordinates. Second, they introduce an efficient texture warping method (Figure 2c) that applies the spatially-varying perspective mapping of each camera view to the current geometry. This mitigates the mismatch between geometry and texture effectively, achieving a good tradeoff between quality and performance in texture mapping. Their local perspective warping allows for registering each color frame to the canonical texture space precisely and seamlessly so that it can enhance the quality of texture over time, as shown in Figures 1d. The quality of the proposed real-time texture fusion is highly competitive to exhaustive offline texture warping methods. In addition, the proposed method can be easily adapted to existing RGB-D scanning frameworks.

The researchers demonstrated that the quality of their real-time texture mapping is highly competitive to that of existing offline texture warping methods. While updating the geometry in real-time, it allows for enhancing the quality of texture over time. They anticipate that their method could be used broadly as it is capable of being integrated easily into any existing RGB-D scanning frameworks.