Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Kuan-Hsun Wang; Chia Chun Cheng; Yi-Ling Chen; Yale Song; Shang-Hong Lai

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Kuan-Hsun Wang ,
Chia Chun Cheng ,
Yi-Ling Chen ,
Yale Song ,
Shang-Hong Lai

International Conference on Pattern Recognition (ICPR) | August 2020

Download BibTex

Near-duplicate video retrieval (NDVR) is an important and challenging problem due to the increasing amount of videos uploaded to the Internet. In this paper, we propose an attention-based deep metric learning method for NDVR. Our method is based on well-established principles: We leverage two-stream networks to combine RGB and optical flow features, and incorporate an attention module to effectively deal with distractor frames commonly observed in near-duplicate videos. We further aggregate the features corresponding to multiple video segments to enhance the discriminative power. The whole system is trained using a deep metric learning objective with a Siamese architecture. Our experiments show that the attention module helps eliminate redundant and noisy frames, while focusing on visually relevant frames for solving NVDR. We evaluate our approach on recent large-scale NDVR datasets, CC\_WEB\_VIDEO, VCDB, FIVR and SVD. To demonstrate the generalization ability of our approach, we report results in both within- and cross-dataset settings, and show that the proposed method significantly outperforms state-of-the-art approaches.