Overview of the MEDIQA-MAGIC Task at ImageCLEF 2024: Multimodal And Generative TelemedICine in Dermatology

CLEF 2024 Conference and Labs of the Evaluation Forum |

Multimodal processing and language generation require models to internally represent both language and vision, and then generate contextually appropriate responses. To do so with arbitrary images and textual inputs in the medical field, requires additional high performance and fidelity. This paper presents the overview of the MEDIQA-MAGIC shared task at ImageCLEF 2024. In this dermatological visual question-answering (VQA) task, participants receive the input of an image and a textual consumer health query, and are expected to output a textual medical answer. A total of twenty two runs were submitted with a variety of general language-vision models and fine-tuned models, with the best team achieving 8.969 BLEU points. We hope that the findings and insights explored here will inspire future research directions to support improved patient care.