3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

  • Siming Yan ,
  • Yu-Qi Yang ,
  • Yu-Xiao Guo ,
  • Hao Pan ,
  • Peng-Shuai Wang ,
  • Xin Tong ,
  • ,
  • Qi-Xing Huang

ICLR 2024 |

PDF

Masked autoencoders (MAE) have recently been introduced to 3D self-supervised pretraining for point clouds due to their great success in NLP and computer vision. Unlike MAEs used in the image domain, where the pretext task is to restore features at the masked pixels, such as colors, the existing 3D MAE works reconstruct the missing geometry only, i.e, the location of the masked points. In contrast to previous studies, we advocate that point location recovery is inessential and restoring intrinsic point features is much superior. To this end, we propose to ignore point position reconstruction and recover high-order features at masked points including surface normals and surface variations, through a novel attention-based decoder which is independent of the encoder design. We validate the effectiveness of our pretext task and decoder design using different encoder structures for 3D training and demonstrate the advantages of our pretrained networks on various point cloud analysis tasks.