MMC | PURRlab @ IT University of Copenhagen

Machine learning has shown promising results in medical image diagnosis, at times with claims of expert-level performance. The availability of large public datasets have shifted the interest of the medical community to high-performance algorithms. However, little attention is paid to the quality of the data or annotations. Algorithms with high reported performances have been shown to suffer from overfitting or shortcuts, i.e. spurious correlations between artifacts in images and diagnostic labels. Examples include pen marks in skin lesion classification, patient position in detection of COVID-19, and chest drains in pneumothorax classification. Performance may appear high when training and evaluating on data with shortcuts, but degraded when the shortcut is removed. This happens because the algorithm cannot generalize based on the actual features related to the diagnosis.

Our goal is to redefine how meta-data is used and thus improve the robustness of algorithms. We plan to:

investigate what kind of different shortcuts (based on demographics or image artefacts) might occur and how these affect the performance and fairness of the algorithms ⚖️.
investigate meta-data-aware methods to try to avoid learning biases or shortcuts ⚔️🛡.

Some students have done work related to this project:

Bianca Ida Pedersen and Max Andreas de Visser investigated shortcuts in 3D lung nodules using the publicly available LIDC-IDRI dataset.
Casper Anton Poulsen and Michelle Hestbek-Møller explored generating synthetic X-ray images using SyntheX.
Trine Naja Eriksen and Cathrine Damgaard developed a chest drain detector with their non-expert annotations that generalizes well to expert labels.
Paula Victoria Menshikoff and Katarina Kraljevic investigated shortcut learning across different demographic attributes for chest X-ray classification.

People

Amelia Jiménez-Sánchez, Théo Sourget, Veronika Cheplygina.

Webinar

We are organizing a webinar series: Datasets through the L👀king-Glass to better understand what researchers are doing with their (meta-) data.

Workshop

We organized a 2-days workshop in Nyborg Strand (DK) In the Picture: Medical Imaging Datasets focused on the challenges within medical imaging datasets that hinder the development of fair and robust AI algorithms. We had several invited talks, and mostly group sessions that focused on engagement and collaboration.

Dataset

NEATX: Non-Expert Annotations of Tubes in X-rays, hosted on Zenodo.

This dataset contains 3.5k chest drain annotations for the NIH-CXR14 dataset, and 1k annotations for four different tube types (chest drain, tracheostomy, nasogastric, and endotracheal) in the PadChest dataset by two data science students.

References

Augmenting Chest X-ray Datasets with Non-Expert Annotations

Veronika Cheplygina, Cathrine Damgaard , Trine Naja Eriksen , Dovile Juodelyte, and Amelia Jiménez-Sánchez

In Medical Image Understanding and Analysis , 2026

Abs URL Code

The advancement of machine learning algorithms in medical image analysis requires the expansion of training datasets. A popular and cost-effective approach is automated annotation extraction from free-text medical reports, primarily due to the high costs associated with expert clinicians annotating medical images, such as chest X-rays. However, it has been shown that the resulting datasets are susceptible to biases and shortcuts. Another strategy to increase the size of a dataset is crowdsourcing, a widely adopted practice in general computer vision with some success in medical image analysis. In a similar vein to crowdsourcing, we enhance two publicly available chest X-ray datasets by incorporating non-expert annotations. However, instead of using diagnostic labels, we annotate shortcuts in the form of tubes. We collect 3.5k chest drain annotations for NIH-CXR14, and 1k annotations for four different tube types in PadChest, and create the Non-Expert Annotations of Tubes in X-rays (NEATX) dataset. We train a chest drain detector with the non-expert annotations that generalizes well to expert labels. Moreover, we compare our annotations to those provided by experts and show “moderate” to “almost perfect” agreement. Finally, we present a pathology agreement study to raise awareness about the quality of ground truth annotations. We make our dataset available on Zenodo at https://zenodo.org/records/14944064and our code available at https://github.com/purrlab/chestxr-label-reliability.
In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review

Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona , Sarah Boer , Vı́ctor M. Campello , Aasa Feragen , and 24 more authors

In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , 2025

Abs URL

Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static – they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/.
Copycats: the many lives of a publicly available medical imaging dataset

Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona , Dovile Juodelyte, Théo Sourget, Caroline Vang-Larsen , and 3 more authors

In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , 2024

URL PDF Poster
Detecting Shortcuts in Medical Images - A Case Study in Chest X-Rays

Amelia Jiménez-Sánchez, Dovile Juodelyte, Bethany Chamberlain , and Veronika Cheplygina

In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) , 2023

URL Code

Funding

DFF (Independent Research Council Denmark) Inge Lehmann 1134-00017B