CATS | PURRlab @ IT University of Copenhagen

Banner Image — Transfer learning from nonmedical or medical image data sets. A network is first trained on a source data set. This network can then be used for feature extraction or further training on the medical target data.

Description

Machine learning (ML) with neural networks has recently made significant progress in medical image diagnosis, with some algorithms now being approved for clinical use. However, in some applications the datasets remain small, for example due to the introduction of new imaging techniques and the cost of annotating the data. In such cases, neural networks can be trained through transfer learning - training on data from a related source domain (such as publicly available data) before using data from a target domain (such as clinical data).

A popular strategy when working with any images is to use ImageNet, a dataset of natural images such as cats, as the source data. Networks which are already trained on ImageNet are also available, simplifying the access. A surprising result is that despite dissimilar image content, ImageNet can still improve performance for medical targets. However, since medical images have different characteristics (3D, high resolution, few classes, among others), this strategy can hurt the diagnostic performance of the algorithm . Unfortunately comparisons between ImageNet and medical source data are limited, possibly due to the prohibitive computational costs. In this project we investigate how to find a good source dataset in a way that reduces the overall carbon footprint, by defining dataset similarity measures which reflect the transferability of datasets, as well as studying how researchers make choices about which datasets to use as sources.

People

Dovile Juodelyte, Yucheng Lu, Théo Sourget, Veronika Cheplygina.

Funding

Novo Nordisk Foundation Starting Package - NNF21OC0068816

References

Learning to Harmonize Cross-Vendor X-ray Images by Non-linear Image Dynamics Correction

Yucheng Lu, Shunxin Wang, Dovile Juodelyte, and Veronika Cheplygina

In Medical Image Understanding and Analysis, 2026

Code
Exploring connections of spectral analysis and transfer learning in medical imaging

Yucheng Lu, Dovile Juodelyte, Jonathan D Victor, and Veronika Cheplygina

In Medical Imaging 2025: Image Processing, 2025

Code
Intuitions of Machine Learning Researchers about Transfer Learning for Medical Image Classification

Yucheng Lu, Hubert Dariusz Zając, Veronika Cheplygina, and Amelia Jiménez-Sánchez

arXiv preprint arXiv:2510.00902, 2025

URL
On dataset transferability in medical image classification

Dovile Juodelyte, Enzo Ferrante, Yucheng Lu, Prabhant Singh, Joaquin Vanschoren, and 1 more author

arXiv preprint arXiv:2412.20172, 2024

URL
Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging

Dovile Juodelyte, Yucheng Lu, Amelia Jiménez-Sánchez, Sabrina Bottazzi, Enzo Ferrante, and 1 more author

International Workshop on Applications of Medical AI (AMAI), 2024

URL
Revisiting Hidden Representations in Transfer Learning for Medical Imaging

Dovile Juodelyte, Amelia Jiménez-Sánchez, and Veronika Cheplygina

Transactions on Machine Learning Research, 2023

URL
Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification

Irma van den Brandt, Floris Fok, Bas Mulders, Joaquin Vanschoren, and Veronika Cheplygina

arXiv preprint arXiv:2107.05940, 2021

URL
Cats or CAT scans: Transfer learning from natural or medical image source data sets?

Veronika Cheplygina

Current Opinion in Biomedical Engineering, 2019

URL