Overview of the MEDIQA-MAGIC Task at ImageCLEF 2024: Multimodal And Generative TelemedICine in Dermatology
- Wen-wai Yim ,
- Asma Ben Abacha ,
- Yujuan Fu ,
- Zhaoyi Sun ,
- Meliha Yetisgen ,
- Fei Xia
Multimodal processing and language generation require models to internally represent both language and vision, and then generate contextually appropriate responses. To do so with arbitrary images and textual inputs in the medical field, requires additional high performance and fidelity. This paper presents the overview of the MEDIQA-MAGIC shared task at ImageCLEF 2024. In this dermatological visual question-answering (VQA) task, participants receive the input of an image and a textual consumer health query, and are expected to output a textual medical answer. A total of twenty two runs were submitted with a variety of general language-vision models and fine-tuned models, with the best team achieving 8.969 BLEU points. We hope that the findings and insights explored here will inspire future research directions to support improved patient care.