This repository is for the paper: Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey. We will keep updating this repository.
If this work is helpful to you, please consider cite our paper using the following citation format:
@article{LIN2025102795,
title = {Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey},
journal = {Information Fusion},
volume = {116},
pages = {102795},
year = {2025},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102795},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524005736},
author = {Qika Lin and Yifan Zhu and Xin Mei and Ling Huang and Jingying Ma and Kai He and Zhen Peng and Erik Cambria and Mengling Feng}
}
- Update on 2024/08/23: Version 1.0 released
For Report Generation:
- IU X-ray:[Link] Preparing a collection of radiology examinations for distribution and retrieval.
2016
- ICLEF-Caption-2017: [Link] Overview of ImageCLEFcaption 2017 : image caption prediction and concept detection for biomedical images.
2017
- ICLEF-Caption-2018: [Link] Overview of the ImageCLEF 2018 Caption Prediction Tasks.
2018
- PEIR Gross: [ACL] On the Automatic Generation of Medical Imaging Reports.
2018
- ROCO: [Link] Radiology Objects in COntext (ROCO): A Multimodal Image Dataset.
2018
- PadChest: [Link] Padchest: A large chest x-ray image dataset with multi-label annotated reports.
2020
- MedICaT: [EMNLP] MedICaT: A Dataset of Medical Images, Captions, and Textual References.
2020
- ARCH: [CVPR] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles.
2021
- FFA-IR: [NeurIPS] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark.
2021
- CTRG: [Link] Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation.
2024
For VQA:
- VQA-Med-2018: [Link] Overview of imageCLEF 2018 medical domain visual question answering task.
2018
- VQA-RAD: [Link] A dataset of clinically generated visual questions and answers about radiology images.
2018
- VQA-Med-2019: [Link] VQA-Med : overview of the medical visual question answering task at ImageCLEF 2019.
2019
- VQA-Med-2020: [Link] Overview of the VQA-Med task at ImageCLEF 2020 : visual question answering and generation in the medical domain.
2020
- RadVisDial-Silver: [Link] Towards Visual Dialog for Radiology.
2020
- RadVisDial-Gold: [Link] Towards Visual Dialog for Radiology.
2020
- PathVQA: [Link] PathVQA: 30000+ Questions for Medical Visual Question Answering.
2020
- VQA-Med-2021: [Link] Overview of the VQA-Med task at ImageCLEF 2021: visual question answering and generation in the medical domain.
2021
- SLAKE: [Link] Slake: A Semantically-Labeled Knowledge-Enhanced Dataset For Medical Visual Question Answering.
2021
- MIMIC-Diff-VQA: [KDD] Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering.
2023
-
ConVIRT: [Link] Contrastive Learning of Medical Visual Representations from Paired Images and Text.
10/2020
-
PubMedCLIP: [ACL] PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?
12/2021
-
CheXzero: [Nature Biomedical Engineering] Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning
09/2022
-
BiomedCLIP: [Link] BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs.
03/2023
-
PLIP: [Nature Medicine] A visual–language foundation model for pathology image analysis using medical Twitter.
03/2023
-
PathCLIP: [AAAI] PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology.
05/2023
-
CT-CLIP: [Link] A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities.
03/2024
-
PairAug: [CVPR] PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
04/2023
-
GLoRIA: [ICCV] GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition.
10/2021
-
BioViL: [ECCV] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing.
04/2022
-
MedCLIP: [EMNLP] MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.
10/2022
-
MGCA: [NeurIPS] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning.
10/2022
-
BioViL-T: [CVPR] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing.
01/2023
-
MedKLIP: [ICCV] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis.
01/2023
-
KAD: [Nature Communications] Knowledge-enhanced visual-language pre-training on chest radiology images.
2/2023
-
PTUnifier: [ICCV] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts.
02/2023
-
Med-UniC: [NeurIPS] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias.
05/2023
-
MCR: [Link] Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval.
12/2023
-
MLIP: [CVPR] MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning.
02/2024
-
MAVL: [CVPR] Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework.
03/2024
-
KEP: [Link] Knowledge-enhanced Visual-Language Pretraining for Computational Pathology.
04/2024
-
DeViDe: [Link] DeViDe: Faceted medical knowledge for improved medical vision-language pre-training.
04/2024
- SkinGPT-4: [Nature Communications] Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4.
04/2023
- PathAsst: [AAAI] PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology.
05/2023
- MedBLIP: [Link] MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts.
05/2023
- LLM-CXR: [ICLR] LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. ``05/2023`
- BiomedGPT: [Nature Medicine] A generalist vision–language foundation model for diverse biomedical tasks.
05/2023
- XrayGPT: [Link] XrayGPT: Chest Radiographs Summarization using Large Medical
Vision-Language Models.
06/2023
- LLaVA-Med: [Link] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day.
06/2023
- Med-Flamingo: [Link] Med-Flamingo: a Multimodal Medical Few-shot Learner.
07/2023
- Med-PaLM M: [Link] Towards Generalist Biomedical AI.
07/2023
- RadFM: [Link] Towards Generalist Foundation Model for Radiology by
Leveraging Web-scale 2D&3D Medical Data.
08/2023
- RaDialog: [Link] RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance.
10/2023
- Qilin-Med-VL: [Link] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare.
10/2023
- MAIRA-1: [Link] Maira-1: A specialised large multimodal model
for radiology report generation.
11/2023
- PathChat: [Link] A Foundational Multimodal Vision Language AI Assistant for
Human Pathology.
12/2023
- MedXChat: [Link] MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation.
12/2023
- CheXagent: [Link] CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation.
01/2024
- CONCH: [Nature Medicine] A visual-language foundation model for computational pathology.
03/2024
- M3D-LaMed: [Link] M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models.
03/2024
- Dia-LLaMA: [Link] Dia-LLaMA: Towards Large Language Model-driven CT Report Generation.
03/2024
- LLaVA-Rad: [Link] Towards a clinically accessible radiology multimodal
model: open-access and lightweight, with automatic evaluation.
03/2024
- WoLF: [Link] WoLF: Wide-scope Large Language Model Framework for CXR Understanding.
03/2024