Overview of ImageCLEFmedical 2024 – Caption Prediction and Concept Detection

Johannes Rückert; Asma Ben Abacha; Alba G. Seco de Herrera; Louise Bloch; Raphael Brüngel; Ahmad Idrissi-Yaghir; Henning Schäfer; Benjamin Bracke; Hendrik Damm; Tabea M. G. Pakull; Cynthia Sabrina Schmidt; Henning Müller; Christoph M. Friedrich

Overview of ImageCLEFmedical 2024 – Caption Prediction and Concept Detection

Johannes Rückert ,
Asma Ben Abacha ,
Alba G. Seco de Herrera ,
Louise Bloch ,
Raphael Brüngel ,
Ahmad Idrissi-Yaghir ,
Henning Schäfer ,
Benjamin Bracke ,
Hendrik Damm ,
Tabea M. G. Pakull ,
Cynthia Sabrina Schmidt ,
Henning Müller ,
Christoph M. Friedrich

CLEF 2024 Conference and Labs of the Evaluation Forum | September 2024

Download BibTex

The ImageCLEFmedical 2024 Caption task on caption prediction and concept detection follows similar challenges held from 2017–2023. The goal is to extract Unified Medical Language System (UMLS) concept annotations and/or define captions from image data. Predictions are compared to original image captions. Images for both tasks are part of the Radiology Objects in COntext version 2 (ROCOv2) dataset. For concept detection, multi-label predictions are compared against UMLS terms extracted from the original captions with additional manually curated concepts via the F1-score. For caption prediction, the semantic similarity of the predictions to the original captions is evaluated using the BERTScore. The task attracted strong participation with 50 registered teams, 14 teams submitted 82 graded runs for the two subtasks. Participants mainly used multi-label classification systems for the concept detection subtask, the winning team DBS-HHU utilized an ensemble of four different Convolutional Neural Networks (CNNs). For the caption prediction subtask, most teams used encoder-decoder frameworks with various backbones, including transformer-based decoders and Long Short-Term Memories (LSTMs), with the winning team PCLmed using medical vision-language foundation models (Med-VLFMs) by combining general and specialist vision models.