Disentangled Prompt Representation for Domain Generalization
- De Cheng ,
- Zhipeng Xu ,
- Xinyang Jiang ,
- Nannan Wang ,
- Dongsheng Li ,
- Xinbo Gao
CVPR 2024 |
Domain Generalization (DG) aims to develop a versatile model capable of performing well on unseen target domains. Recent advancements in pre-trained Visual Foundation Models (VFMs), such as CLIP, show significant potential in enhancing the generalization abilities of deep models. Although there is a growing focus on VFM-based domain prompt tuning for DG, effectively learning prompts that disentangle invariant features across all domains remains a major challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Observing that the text modality of VFMs is inherently easier to disentangle, we introduce a novel text feature guided visual prompt tuning framework. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. Moreover, we also devise domain-specific prototype learning to fully exploit domain-specific information to combine with the invariant feature prediction. Extensive experiments on mainstream DG datasets, namely PACS, VLCS, OfficeHome, DomainNet and TerraInc, demonstrate that the proposed method achieves superior performances to state-of-the-art DG methods. Our source code is available in the supplementary materials.