GitHub

Multimodal Learning in Healthcare

This repository is for the paper: Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey. We will keep updating this repository.

If this work is helpful to you, please consider cite our paper using the following citation format:

@article{LIN2025102795,
title = {Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey},
journal = {Information Fusion},
volume = {116},
pages = {102795},
year = {2025},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102795},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524005736},
author = {Qika Lin and Yifan Zhu and Xin Mei and Ling Huang and Jingying Ma and Kai He and Zhen Peng and Erik Cambria and Mengling Feng}
}

News

Update on 2024/08/23: Version 1.0 released

Datasets and Resoures

For Report Generation:

IU X-ray：[Link] Preparing a collection of radiology examinations for distribution and retrieval. 2016
ICLEF-Caption-2017: [Link] Overview of ImageCLEFcaption 2017 : image caption prediction and concept detection for biomedical images. 2017
ICLEF-Caption-2018: [Link] Overview of the ImageCLEF 2018 Caption Prediction Tasks. 2018
PEIR Gross: [ACL] On the Automatic Generation of Medical Imaging Reports. 2018
ROCO: [Link] Radiology Objects in COntext (ROCO): A Multimodal Image Dataset. 2018
PadChest: [Link] Padchest: A large chest x-ray image dataset with multi-label annotated reports. 2020
MedICaT: [EMNLP] MedICaT: A Dataset of Medical Images, Captions, and Textual References. 2020
ARCH: [CVPR] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles. 2021
FFA-IR: [NeurIPS] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark. 2021
CTRG: [Link] Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation. 2024

For VQA:

VQA-Med-2018: [Link] Overview of imageCLEF 2018 medical domain visual question answering task. 2018
VQA-RAD: [Link] A dataset of clinically generated visual questions and answers about radiology images. 2018
VQA-Med-2019: [Link] VQA-Med : overview of the medical visual question answering task at ImageCLEF 2019. 2019
VQA-Med-2020: [Link] Overview of the VQA-Med task at ImageCLEF 2020 : visual question answering and generation in the medical domain. 2020
RadVisDial-Silver: [Link] Towards Visual Dialog for Radiology. 2020
RadVisDial-Gold: [Link] Towards Visual Dialog for Radiology. 2020
PathVQA: [Link] PathVQA: 30000+ Questions for Medical Visual Question Answering. 2020
VQA-Med-2021: [Link] Overview of the VQA-Med task at ImageCLEF 2021: visual question answering and generation in the medical domain. 2021
SLAKE: [Link] Slake: A Semantically-Labeled Knowledge-Enhanced Dataset For Medical Visual Question Answering. 2021
MIMIC-Diff-VQA: [KDD] Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering. 2023

Contrastive Foundatation Models

ConVIRT: [Link] Contrastive Learning of Medical Visual Representations from Paired Images and Text. 10/2020
PubMedCLIP: [ACL] PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain? 12/2021
CheXzero: [Nature Biomedical Engineering] Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning 09/2022
BiomedCLIP: [Link] BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. 03/2023
PLIP: [Nature Medicine] A visual–language foundation model for pathology image analysis using medical Twitter. 03/2023
PathCLIP: [AAAI] PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. 05/2023
CT-CLIP: [Link] A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities. 03/2024
PairAug: [CVPR] PairAug: What Can Augmented Image-Text Pairs Do for Radiology? 04/2023
GLoRIA: [ICCV] GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. 10/2021
BioViL: [ECCV] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. 04/2022
MedCLIP: [EMNLP] MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. 10/2022
MGCA: [NeurIPS] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning. 10/2022
BioViL-T: [CVPR] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing. 01/2023
MedKLIP: [ICCV] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis. 01/2023
KAD: [Nature Communications] Knowledge-enhanced visual-language pre-training on chest radiology images. 2/2023
PTUnifier: [ICCV] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts. 02/2023
Med-UniC: [NeurIPS] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias. 05/2023
MCR: [Link] Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval. 12/2023
MLIP: [CVPR] MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning. 02/2024
MAVL: [CVPR] Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework. 03/2024
KEP: [Link] Knowledge-enhanced Visual-Language Pretraining for Computational Pathology. 04/2024
DeViDe: [Link] DeViDe: Faceted medical knowledge for improved medical vision-language pre-training. 04/2024

Multimodal Large Language Models

SkinGPT-4: [Nature Communications] Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. 04/2023
PathAsst: [AAAI] PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. 05/2023
MedBLIP: [Link] MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts. 05/2023
LLM-CXR: [ICLR] LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. ``05/2023`
BiomedGPT: [Nature Medicine] A generalist vision–language foundation model for diverse biomedical tasks. 05/2023
XrayGPT: [Link] XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models. 06/2023
LLaVA-Med: [Link] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day. 06/2023
Med-Flamingo: [Link] Med-Flamingo: a Multimodal Medical Few-shot Learner. 07/2023
Med-PaLM M: [Link] Towards Generalist Biomedical AI. 07/2023
RadFM: [Link] Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. 08/2023
RaDialog: [Link] RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance. 10/2023
Qilin-Med-VL: [Link] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare. 10/2023
MAIRA-1: [Link] Maira-1: A specialised large multimodal model for radiology report generation. 11/2023
PathChat: [Link] A Foundational Multimodal Vision Language AI Assistant for Human Pathology. 12/2023
MedXChat: [Link] MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation. 12/2023
CheXagent: [Link] CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation. 01/2024
CONCH: [Nature Medicine] A visual-language foundation model for computational pathology. 03/2024
M3D-LaMed: [Link] M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models. 03/2024
Dia-LLaMA: [Link] Dia-LLaMA: Towards Large Language Model-driven CT Report Generation. 03/2024
LLaVA-Rad: [Link] Towards a clinically accessible radiology multimodal model: open-access and lightweight, with automatic evaluation. 03/2024
WoLF: [Link] WoLF: Wide-scope Large Language Model Framework for CXR Understanding. 03/2024

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figure		figure
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Learning in Healthcare

News

Datasets and Resoures

Contrastive Foundatation Models

Multimodal Large Language Models

About

Releases

Packages

DeepReasoning/aihealth

Folders and files

Latest commit

History

Repository files navigation

Multimodal Learning in Healthcare

News

Datasets and Resoures

Contrastive Foundatation Models

Multimodal Large Language Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages