Skip to content

DeepReasoning/aihealth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Multimodal Learning in Healthcare arXiv

This repository is for the paper: Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey. We will keep updating this repository.

milestones If this work is helpful to you, please consider cite our paper using the following citation format:

@article{LIN2025102795,
title = {Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey},
journal = {Information Fusion},
volume = {116},
pages = {102795},
year = {2025},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102795},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524005736},
author = {Qika Lin and Yifan Zhu and Xin Mei and Ling Huang and Jingying Ma and Kai He and Zhen Peng and Erik Cambria and Mengling Feng}
}

News

  • Update on 2024/08/23: Version 1.0 released

Datasets and Resoures

For Report Generation:

  • IU X-ray:[Link] Preparing a collection of radiology examinations for distribution and retrieval. 2016
  • ICLEF-Caption-2017: [Link] Overview of ImageCLEFcaption 2017 : image caption prediction and concept detection for biomedical images. 2017
  • ICLEF-Caption-2018: [Link] Overview of the ImageCLEF 2018 Caption Prediction Tasks. 2018
  • PEIR Gross: [ACL] On the Automatic Generation of Medical Imaging Reports. 2018
  • ROCO: [Link] Radiology Objects in COntext (ROCO): A Multimodal Image Dataset. 2018
  • PadChest: [Link] Padchest: A large chest x-ray image dataset with multi-label annotated reports. 2020
  • MedICaT: [EMNLP] MedICaT: A Dataset of Medical Images, Captions, and Textual References. 2020
  • ARCH: [CVPR] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles. 2021
  • FFA-IR: [NeurIPS] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark. 2021
  • CTRG: [Link] Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation. 2024

For VQA:

  • VQA-Med-2018: [Link] Overview of imageCLEF 2018 medical domain visual question answering task. 2018
  • VQA-RAD: [Link] A dataset of clinically generated visual questions and answers about radiology images. 2018
  • VQA-Med-2019: [Link] VQA-Med : overview of the medical visual question answering task at ImageCLEF 2019. 2019
  • VQA-Med-2020: [Link] Overview of the VQA-Med task at ImageCLEF 2020 : visual question answering and generation in the medical domain. 2020
  • RadVisDial-Silver: [Link] Towards Visual Dialog for Radiology. 2020
  • RadVisDial-Gold: [Link] Towards Visual Dialog for Radiology. 2020
  • PathVQA: [Link] PathVQA: 30000+ Questions for Medical Visual Question Answering. 2020
  • VQA-Med-2021: [Link] Overview of the VQA-Med task at ImageCLEF 2021: visual question answering and generation in the medical domain. 2021
  • SLAKE: [Link] Slake: A Semantically-Labeled Knowledge-Enhanced Dataset For Medical Visual Question Answering. 2021
  • MIMIC-Diff-VQA: [KDD] Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering. 2023

Contrastive Foundatation Models

  • ConVIRT: [Link] Contrastive Learning of Medical Visual Representations from Paired Images and Text. 10/2020

  • PubMedCLIP: [ACL] PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain? 12/2021

  • CheXzero: [Nature Biomedical Engineering] Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning 09/2022

  • BiomedCLIP: [Link] BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. 03/2023

  • PLIP: [Nature Medicine] A visual–language foundation model for pathology image analysis using medical Twitter. 03/2023

  • PathCLIP: [AAAI] PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. 05/2023

  • CT-CLIP: [Link] A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities. 03/2024

  • PairAug: [CVPR] PairAug: What Can Augmented Image-Text Pairs Do for Radiology? 04/2023

  • GLoRIA: [ICCV] GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. 10/2021

  • BioViL: [ECCV] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. 04/2022

  • MedCLIP: [EMNLP] MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. 10/2022

  • MGCA: [NeurIPS] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning. 10/2022

  • BioViL-T: [CVPR] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing. 01/2023

  • MedKLIP: [ICCV] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis. 01/2023

  • KAD: [Nature Communications] Knowledge-enhanced visual-language pre-training on chest radiology images. 2/2023

  • PTUnifier: [ICCV] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts. 02/2023

  • Med-UniC: [NeurIPS] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias. 05/2023

  • MCR: [Link] Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval. 12/2023

  • MLIP: [CVPR] MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning. 02/2024

  • MAVL: [CVPR] Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework. 03/2024

  • KEP: [Link] Knowledge-enhanced Visual-Language Pretraining for Computational Pathology. 04/2024

  • DeViDe: [Link] DeViDe: Faceted medical knowledge for improved medical vision-language pre-training. 04/2024

Multimodal Large Language Models

  • SkinGPT-4: [Nature Communications] Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. 04/2023
  • PathAsst: [AAAI] PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. 05/2023
  • MedBLIP: [Link] MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts. 05/2023
  • LLM-CXR: [ICLR] LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. ``05/2023`
  • BiomedGPT: [Nature Medicine] A generalist vision–language foundation model for diverse biomedical tasks. 05/2023
  • XrayGPT: [Link] XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models. 06/2023
  • LLaVA-Med: [Link] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day. 06/2023
  • Med-Flamingo: [Link] Med-Flamingo: a Multimodal Medical Few-shot Learner. 07/2023
  • Med-PaLM M: [Link] Towards Generalist Biomedical AI. 07/2023
  • RadFM: [Link] Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. 08/2023
  • RaDialog: [Link] RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance. 10/2023
  • Qilin-Med-VL: [Link] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare. 10/2023
  • MAIRA-1: [Link] Maira-1: A specialised large multimodal model for radiology report generation. 11/2023
  • PathChat: [Link] A Foundational Multimodal Vision Language AI Assistant for Human Pathology. 12/2023
  • MedXChat: [Link] MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation. 12/2023
  • CheXagent: [Link] CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation. 01/2024
  • CONCH: [Nature Medicine] A visual-language foundation model for computational pathology. 03/2024
  • M3D-LaMed: [Link] M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models. 03/2024
  • Dia-LLaMA: [Link] Dia-LLaMA: Towards Large Language Model-driven CT Report Generation. 03/2024
  • LLaVA-Rad: [Link] Towards a clinically accessible radiology multimodal model: open-access and lightweight, with automatic evaluation. 03/2024
  • WoLF: [Link] WoLF: Wide-scope Large Language Model Framework for CXR Understanding. 03/2024

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published