In the Picture: Medical Imaging Datasets

Machine learning has shown promising results in medical image diagnosis, at times with claims of expert-level performance. However, algorithms with high reported performances do not always generalize to real-life settings, leading to incorrect and/or biased diagnoses. Two key datasets-related issues contribute to this challenge: (i) the presence of shortcuts, i.e. spurious correlations between artifacts in images and diagnostic labels, and (ii) the representativeness of the patients the algorithms were trained on, in terms of demographics and/or disease sub-type.

Our workshop’s focus will be on challenges within medical imaging datasets that hinder the development of fair and robust AI algorithms. We will have several invited talks, but focus engagement and collaboration. Participants will work in groups on various projects, such as tools for reviewing and documenting datasets, writing (living) reviews, and formulating strategies to continue this and similar community-led projects, for example via additional funding opportunities.

When and where:

19-20 September 2024, Nyborg Strand Hotel (Denmark) and partially online


We are currently in the process of inviting selected participants to join the workshop. If there are enough spots, we will open registration to others. Parts of the workshop will also have virtual participation options, to allow more people to join. If you would like to stay updated / sign up for the waiting list, fill in this form.


  • Amelia Jiménez-Sánchez (IT University of Copenhagen)
  • Veronika Cheplygina (IT University of Copenhagen)
  • Enzo Ferrante (Argentina’s National Research Council)
  • Leo Joskowicz (Hebrew University of Jerusalem)
  • Judy Gichoya (Emory University)

Support from

  • Danish Data Science Academy (DDSA)
  • Independent Research Fund Denmark (DFF) - Inge Lehmann number 1134-00017B
  • IT University of Copenhagen (ITU)