[NEWS.20240405] The related survey paper has been released.
[NOTE] If you have any questions, please don't hesitate to contact us.
Foundation model, which is pre-trained on broad data and is able to adapt to a wide range of tasks, is advancing healthcare. It promotes the development of healthcare artificial intelligence (AI) models, breaking the contradiction between limited AI models and diverse healthcare practices. Much more widespread healthcare scenarios will benefit from the development of a healthcare foundation model (HFM), improving their advanced intelligent healthcare services.
This repository is a collection of AWESOME things about Foundation models in healthcare, including language foundation models (LFMs), vision foundation models (VFMs), bioinformatics foundation models (BFMs), and multimodal foundation models (MFMs). Feel free to star and fork.
This repository provides the advancement of current healthcare foundation models based on the following paper:
Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions 中译版
Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, Hao Chen
SMART Lab, The Hong Kong University of Science and Technology
If you find this repository is useful for you, please cite our paper:
@misc{he2024foundation,
title={Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions},
author={Yuting He and Fuxiang Huang and Xinrui Jiang and Yuxiang Nie and Minghao Wang and Jiguang Wang and Hao Chen},
year={2024},
eprint={2404.03264},
archivePrefix={arXiv},
primaryClass={cs.CY}
}
2024
- [arXiv] Foundation models for biomedical image segmentation: A survey. [Paper]
- [arXiv] Progress and opportunities of foundation models in bioinformatics. [Paper]
- [arXiv] Large language models in bioinformatics: applications and perspectives. [Paper]
- [arXiv] Data-centric foundation models in computational healthcare: A survey. [Paper]
- [arXiv] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review. [Paper]
2023
- [ACM Computing Surveys] Pre-trained language models in biomedical domain: A systematic survey. [Paper]
- [Nature medicine] Large language models in medicine. [Paper]
- [arXiv] A survey of large language models in medicine: Progress, application, and challenge. [Paper]
- [arXiv] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. [Paper]
- [arXiv] Large language models illuminate a progressive pathway to artificial healthcare assistant: A review. [Paper]
- [arXiv] Foundational models in medical imaging: A comprehensive survey and future vision. [Paper]
- [arXiv] CLIP in medical imaging: A comprehensive survey. [Paper]
- [arXiv] Medical vision language pretraining: A survey. [Paper]
- [MIR] Pre-training in medical data: A survey. [Paper]
- [J-BHI] Large AI models in health informatics: Applications, challenges, and the future. [Paper]
- [MedComm–Future Medicine] Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. [Paper]
- [Nature] Foundation models for generalist medical artificial intelligence. [Paper]
- [MedIA] On the challenges and perspectives of foundation models for medical image analysis. [Paper]
2024
- [AAAI] Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and realworld multi-turn dialogue. [Paper] [Code]
- [NeurIPS] MDAgents: An adaptive collaboration of LLMs for medical decision-making. [Paper] [Code]
- [arXiv] Me LLaMA: Foundation large language models for medical applications [Paper] [Code]
- [arXiv] BioMistral: A collection of open-source pretrained large language models for medical domains [Paper] [Code]
- [arXiv] BiMediX: Bilingual medical mixture of experts LLM [Paper] [Code]
- [arXiv] OncoGPT: A medical conversational model tailored with oncology domain expertise on a large language model Meta-AI (LLaMA) [Paper] [Code]
- [arXiv] JMLR: Joint medical LLM and retrieval training for enhancing reasoning and professional question answering capability [Paper]
2023
- [Bioinformatics] MedCPT: A method for zero-shot biomedical information retrieval using contrastive learning with PubMedBERT. [Paper] [Code]
- [arXiv] Pmc-llama: Towards building open-source language models for medicine. [Paper] [Code]
- [arXiv] Meditron-70b: Scaling medical pretraining for large language models. [Paper] [Code]
- [arXiv] Qilin-med: Multi-stage knowledge injection advanced medical large language model. [Paper] [Code]
- [arXiv] Huatuogpt-ii, one-stage training for medical adaption of llms. [Paper] [Code]
- [NPJ Digit. Med.] A study of generative large language model for medical research and healthcare. [Paper] [Code]
- [arXiv] From beginner to expert: Modeling medical knowledge into general llms. [Paper]
- [arXiv] Huatuo: Tuning llama model with chinese medical knowledge. [Paper] [Code]
- [arXiv] Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. [Paper] [Code]
- [arXiv] Medalpaca–an open-source collection of medical conversational ai models and training data. [Paper] [Code]
- [arXiv] Alpacare: Instruction-tuned large language models for medical application. [Paper] [Code]
- [arXiv] Huatuogpt, towards taming language model to be a doctor. [Paper] [Code]
- [arXiv] Doctorglm: Fine-tuning your chinese doctor is not a herculean task. [Paper] [Code]
- [arXiv] Bianque: Balancing the questioning and suggestion ability of health llms with multi-turn health conversations polished by chatgpt. [Paper] [Code]
- [arXiv] Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. [Paper] [Code]
- [Github] Visual med-alpaca: A parameter-efficient biomedical llm with visual capabilities. [Code]
- [arXiv] Ophglm: Training an ophthalmology large languageand-vision assistant based on instructions and dialogue. [Paper] [Code]
- [arXiv] Chatcad: Interactive computer-aided diagnosis on medical image using large language models. [Paper] [Code]
- [arXiv] Chatcad+: Towards a universal and reliable interactive cad using llms. [Paper] [Code]
- [arXiv] Deid-gpt: Zero-shot medical text de-identification by gpt-4. [Paper] [Code]
- [arXiv] Can generalist foundation models outcompete special-purpose tuning? case study in medicine. [Paper] [Code]
- [arXiv] Medagents: Large language models as collaborators for zero-shot medical reasoning. [Paper] [Code]
- [AIME] Soft-prompt tuning to predict lung cancer using primary care free-text dutch medical notes. [Paper] [Code]
- [arXiv] Clinical decision transformer: Intended treatment recommendation through goal prompting. [Paper] [Code]
- [Nature] Large language models encode clinical knowledge [Paper]
- [arXiv] Towards expert-level medical question answering with large language models [Paper]
- [arXiv] Gpt-doctor: Customizing large language models for medical consultation [Paper]
- [arXiv] Clinicalgpt: Large language models finetuned with diverse medical data and comprehensive evaluation [Paper]
- [arXiv] Leveraging a medical knowledge graph into large language models for diagnosis prediction [Paper]
2022
- [NPJ Digit. Med.] A large language model for electronic health records. [Paper] [Code]
- [AMIA Annu. Symp. Proc.] Healthprompt: A zero-shot learning paradigm for clinical natural language processing. [Paper]
- [BioNLP] Position-based prompting for health outcome generation [Paper]
2021
- [ACM Trans. Comput. Healthc.] Domain-specific language model pretraining for biomedical natural language processing. [Paper] [Code]
2020
- [JMIR Med. Info.] Modified bidirectional encoder representations from transformers extractive summarization model for hospital information systems based on character-level tokens (alphabert): development and performance evaluation. [Paper] [Code]
- [Scientific reports] Behrt: transformer for electronic health records. [Paper] [Code]
- [BioNLP] BioBART: Pretraining and evaluation of a biomedical generative language model. [Paper] [Code]
2019
- [NPJ Digit. Med.] ClinicalBERT: A hybrid learning model for natural language inference in healthcare using BERT. [Paper] [Code]
- [Method. Biochem. Anal.] Biobert: a pre-trained biomedical language representation model for biomedical text mining. [Paper] [Code]
2024
- [arXiv] USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. [paper]
- [CVPR] VoCo: A simple-yet-effective volume contrastive learning framework for 3D medical image analysis. [paper][Code]
- [NeurIPS] LVM-Med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching. [paper] [Code]
- [Nature Medicine] Towards a general-purpose foundation model for computational pathology. [paper] [Code]
- [arXiv] RudolfV: A foundation model by pathologists for pathologists. [paper] [Code]
- [Nature Communications] Segment anything in medical images. [paper] [Code]
- [ICASSP] SAM-OCTA: A fine-tuning strategy for applying foundation model to OCTA image segmentation tasks.[paper] [Code]
- [WACV] AFTer-SAM: Adapting SAM with axial fusion transformer for medical imaging segmentation. [paper]
- [MIDL] Adaptivesam: Towards efficient tuning of sam for surgical scene segmentation. [paper] [Code]
- [arXiv] SegmentAnyBone: A universal model that segments any bone at any location on MRI [paper] [Code]
- [SSRN] Swinsam: Fine-grained polyp segmentation in colonoscopy images via segment anything model integrated with a Swin transformer decoder. [paper]
- [AAAI] Surgicalsam: Efficient class promptable surgical instrument segmentation [paper] [Code]
- [Medical Image Analysis] Prompt tuning for parameter-efficient medical image segmentation. [paper] [Code]
2023
- [ICCV] UniverSeg: Universal medical image segmentation. [paper] [Code]
- [arXiv] STU-Net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. [paper] [Code]
- [arXiv] SAM-Med3D. [paper] [Code]
- [Nature] A foundation model for generalizable disease detection from retinal images. [paper]
- [arXiv] VisionFM: a multi-modal multi-task vision foundation model for generalist ophthalmic Artificial Intelligence. [paper]
- [arXiv] Segvol: Universal and interactive volumetric medical image segmentation. [paper] [Code]
- [MICCAI] Models Genesis: Generic autodidactic models for 3D medical image analysis. [paper] [Code]
- [MICCAI] Deblurring masked autoencoder is better recipe for ultrasound image recognition. [paper] [Code]
- [arXiv] Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset. [paper] [Code]
- [MICCAI] Foundation model for endoscopy video analysis via large-scale self-supervised pre-train. [paper][Code]
- [MIDL] MoCo pretraining improves representation and transferability of chest X-ray models. [paper] [Code]
- [arXiv] BROW: Better features for whole slide image based on self-distillation[paper]
- [arXiv] Computational pathology at health system scale--self-supervised foundation models from three billion images. [paper]
- [CVPR] Geometric visual similarity learning in 3D medical image self-supervised pre-training.[paper] [Code]
- [arXiv] Virchow: A million-slide digital pathology foundation model.[paper] [Code]
- [arXiv] Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation.[paper] [Code]
- [ICCV] Comprehensive multimodal segmentation in medical imaging: combining YOLOv8 with SAM and HQ-SAM models. [paper]
- [arXiv] 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable medical image segmentation.[paper] [Code]
- [arXiv] Part to whole: Collaborative prompting for surgical instrument segmentation. [paper] [Code]
- [arXiv] Towards general purpose vision foundation models for medical image analysis: An experimental study of DINOv2 on radiology benchmarks.[paper] [Code]
- [arXiv] Skinsam: Empowering skin cancer segmentation with segment anything model.[paper]
- [arXiv] Polyp-sam: Transfer sam for polyp segmentation. [paper] [Code]
- [arXiv] Customized segment anything model for medical image segmentation. [paper] [Code]
- [arXiv] Ladder fine-tuning approach for SAM integrating complementary network. [paper] [Code]
- [arXiv] Cheap lunch for medical image segmentation by fine-tuning sam on few exemplars. [paper]
- [arXiv] SemiSAM: Exploring SAM for enhancing semi-supervised medical image segmentation with extremely limited annotations. [paper]
- [IWMLMI] Mammo-sam: Adapting foundation segment anything model for automatic breast mass segmentation in whole mammograms. [paper]
- [arXiv] Promise: Prompt-driven 3D medical image segmentation using pretrained image foundation models. [paper] [Code]
- [arXiv] Medical sam adapter: Adapting segment anything model for medical image segmentation. [paper] [Code]
- [arXiv] SAM-Med2D [paper] [Code]
- [arXiv] Medivista-sam: Zero-shot medical video analysis with spatio-temporal sam adaptation. [paper] [Code]
- [arXiv] Samus: Adapting segment anything model for clinically-friendly and generalizable ultrasound image segmentation. [paper]
- [MICCAI] Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model. [paper] [Code]
- [arXiv] AutoSAM: Adapting SAM to medical images by overloading the prompt encoder. [paper]
- [arXiv] DeSAM: Decoupling segment anything model for generalizable medical image segmentation [paper] [Code]
- [bioRxiv] A foundation model for cell segmentation.[paper] [Code]
- [MICCAI] SAM-U: Multi-box prompts triggered uncertainty estimation for reliable SAM in medical image. [paper]
- [MICCAI] Sam-path: A segment anything model for semantic segmentation in digital pathology. [paper]
- [arXiv] All-in-sam: from weak annotation to pixel-wise nuclei segmentation with prompt-based finetuning.[paper]
- [arXiv] Polyp-sam++: Can a text guided sam perform better for polyp segmentation? [paper] [Code]
- [arXiv] Segment anything model with uncertainty rectification for auto-prompting medical image segmentation. [paper]
- [arXiv] MedLSAM: Localize and segment anything model for 3D medical images. [paper] [Code]
- [arXiv] nnSAM: Plug-and-play segment anything model improves nnUNet performance. [paper] [Code]
- [arXiv] EviPrompt: A training-free evidential prompt generation method for segment anything model in medical images. [paper]
- [arXiv] One-shot localization and segmentation of medical images with foundation models. [paper]
- [arXiv] Samm (segment any medical model): A 3d slicer integration to sam. [paper] [Code]
- [arXiv] Task-driven prompt evolution for foundation models.[paper]
2022
- [Machine Learning with Applications] Self supervised contrastive learning for digital histopathology. [paper] [Code]
- [Medical Image Analysis] Transformer-based unsupervised contrastive learning for histopathological image classification. [paper] [Code]
- [arXiv] Self-supervised learning from 100 million medical images. [paper]
- [CVPR] Self-supervised pre-training of swin transformers for 3d medical image analysis.[paper] [Code]
2021
- [Medical Image Analysis] Models genesis. [paper] [Code]
- [Medical Imaging with Deep Learning] MoCo pretraining improves representation and transferability of chest X-ray models. [paper]
- [IEEE transactions on medical imaging] Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning.[Paper]
2020
- [MICCAI] Comparing to learn: Surpassing imageNet pretraining on radiographs by comparing image representations. [paper] [Code]
2019
2024
- [Nucleic Acids Research] Multiple sequence alignment-based RNA language model and its application to structural inference. [Paper], [Code]
- [Nature Methods] scGPT: toward building a foundation model for single-cell multi-omics using generative AI. [Paper], [Code]
- [Nature Machine Intelligence] A 5’ UTR language model for decoding untranslated regions of mRNA and function predictions. [Paper], [Code]
- [ICLR 2024] CellPLM: Pre-training of Cell Language Model Beyond Single Cells. [Paper], [Code] 2023
- [arXiv] DNAGPT: A generalized pre-trained tool for versatile DNA sequence analysis tasks. [Paper], [Code]
- [arXiv] HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution. [Paper], [Code]
- [Nature Biotechnology] Large language models generate functional protein sequences across diverse families. [Paper], [Code]
- [Cell Systems] ProGen2: Exploring the boundaries of protein language models. [Paper], [Code]
- [Nature] Transfer learning enables predictions in network biology. [Paper], [Code]
- [arXiv] DNABERT-2: Efficient foundation model and benchmark for multi-species genome. [Paper], [Code]
- [bioRxiv] The nucleotide transformer: Building and evaluating robust foundation models for human genomics. [Paper], [Code]
- [bioRxiv] GENA-LM: A family of open-source foundational models for long DNA sequences. [Paper], [Code]
- [bioRxiv] Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction. [Paper], [Code]
- [bioRxiv] Deciphering 3’ UTR mediated gene regulation using interpretable deep representation learning. [Paper], [Code]
- [Science] Evolutionary-scale prediction of atomic-level protein structure with a language model. [Paper], [Code]
- [bioRxiv] Universal cell embeddings: A foundation model for cell biology. [Paper], [Code]
- [bioRxiv] Large scale foundation model on single-cell transcriptomics. [Paper], [Code]
- [arXiv] Large-scale cell representation learning via divide-and-conquer contrastive learning. [Paper], [Code]
- [bioRxiv] CodonBERT: Large language models for mRNA design and optimization. [Paper], [Code]
- [bioRxiv] xTrimoPGLM: Unified 100B-scale pre-trained transformer for deciphering the language of protein. [Paper]
- [bioRxiv] GenePT: A simple but effective foundation model for genes and cells built from ChatGPT. [Paper], [Code]
- [bioRxiv] scELMo: Embeddings from language models are good learners for single-cell data analysis. [Paper], [Code]
- [bioRxiv] Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. [Paper], [Code]
- [bioRxiv] GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model. [Paper], [Code]
2022
- [Nature Machine Intelligence] scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. [Paper], [Code]
- [bioRxiv] Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. [Paper], [Code]
- [NAR Genomics & Bioinformatics] Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. [Paper], [Code]
- [Nature Biotechnology] Single-sequence protein structure prediction using language models and deep learning. [Paper], [Code]
2021
- [Bioinformatics] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. [Paper], [Code]
- [IEEE TPAMI] ProtTrans: Toward understanding the language of life through self-supervised learning. [Paper], [Code]
- [ICML 2021] MSA Transformer. [Paper], [Code]
- [PNAS] Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. [Paper], [Code]
- [Nature] Highly accurate protein structure prediction with AlphaFold. [Paper], [Code]
- [arXiv] Multi-modal self-supervised pre-training for regulatory genome across cell types. [Paper], [Code]
2024
- [ICASSP] Etp: Learning transferable ecg representations via ecg-text pretraining. [Paper]
- [NeurIPS] Med-unic: Unifying cross-lingual medical vision language pre-training by diminishing bias. [Paper] [Code]
- [NeurIPS] Quilt-1m: One million image-text pairs for histopathology. [Paper] [Code]
- [Nature Medicine] A visual-language foundation model for computational pathology. [Paper]
- [NeurIPS] LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. [Paper] [Code]
- [AAAI] PathAsst: Generative foundation AI assistant for pathology. [Paper] [Code]
- [WACV] I-AI: A controllable & interpretable AI system for decoding radiologists’ intense focus for accurate CXR diagnoses. [Paper] [Code]
- [arXiv] M3D: Advancing 3D medical image analysis with multi-modal large language models. [Paper] [Code]
2023
- [ICLR] Advancing radiograph representation learning with masked record modeling. [Paper] [Code]
- [arXiv] BiomedGPT: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal Tasks. [Paper] [Code]
- [arXiv] BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. [Paper] [Code]
- [arXiv] Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. [Paper] [Code]
- [CVPR] Visual language pretrained multiple instance zero-shot transfer for histopathology images. [Paper] [Code]
- [ICCV] Medklip: Medical knowledge enhanced language-image pre-training. [Paper] [Code]
- [arXiv] UniBrain: Universal brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. [Paper] [Code]
- [EACL] PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain. [Paper] [Code]
- [MICCAI] M-FLAG: Medical vision-language pre-training with frozen language models and latent space geometry optimization. [Paper] [Code]
- [arXiv] IMITATE: Clinical prior guided hierarchical vision-language pre-training. [Paper]
- [arXiv] CXR-CLIP: Toward large scale chest X-ray language-image pre-training. [Paper] [Code]
- [BIBM] UMCL: Unified medical image-text-label contrastive learning with continuous prompt. [Paper]
- [Nature Communications] Knowledge-enhanced visual-language pre-training on chest radiology images. [Paper]
- [Nature Machine Intelligence] Multi-modal molecule structure–text model for text-based retrieval and editing. [Paper] [Code]
- [MICCAI] Clip-lung: Textual knowledge-guided lung nodule malignancy prediction. [Paper]
- [MICCAI] Pmc-clip: Contrastive language-image pre-training using biomedical documents. [Paper] [Code]
- [arXiv] Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning. [Paper] [Code]
- [ICCV] Prior: Prototype representation joint learning from medical images and reports. [Paper] [Code]
- [MICCAI] Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering. [Paper] [Code]
- [arXiv] T3d: Towards 3d medical image understanding through vision-language pre-training. [Paper]
- [MICCAI] Gene-induced multimodal pre-training for imageomic classification. [Paper] [Code]
- [arXiv] A text-guided protein design framework. [Paper] [Code]
- [Nature Medicine] A visual--language foundation model for pathology image analysis using medical Twitter. [Paper] [Code]
- [arXiv] Towards generalist biomedical ai. [Paper] [Code]
- [ML4H] Med-Flamingo: A multimodal medical few-shot learner. [Paper] [Code]
- [MLMIW] Exploring the transfer learning capabilities of CLIP on domain generalization for diabetic retinopathy. [Paper] [Code]
- [MICCAI] Open-ended medical visual question answering through prefix tuning of language models. [Paper] [Code]
- [arXiv] Qilin-Med-VL: Towards chinese large vision-language model for general healthcare. [Paper] [Code]
- [arXiv] A foundational multimodal vision language AI assistant for human pathology. [Paper]
- [arXiv] Effectively fine-tune to improve large multimodal models for radiology report generation. [Paper]
- [MLMIW] Multi-modal adapter for medical vision-and-language learning. [Paper]
- [arXiv] Text-guided foundation model adaptation for pathological image classification. [Paper] [Code]
- [arXiv] XrayGPT: Chest radiographs summarization using medical vision-language models. [Paper] [Code]
- [MICCAI] Xplainer: From X-Ray observations to explainable zero-shot diagnosis. [Paper] [Code]
- [MICCAI] Multiple prompt fusion for zero-shot lesion detection using vision-language models. [Paper]
2022
- [JMLR] Contrastive learning of medical visual representations from paired images and text. [Paper] [Code]
- [ECCV] Joint learning of localized representations from medical images and reports. [Paper]
- [NeurIPS] Multi-granularity cross-modal alignment for generalized medical visual representation learning. [Paper] [Code]
- [AAAI] Clinical-BERT: Vision-language pre-training for radiograph diagnosis and reports generation. [Paper]
- [MICCAI] Multi-modal masked autoencoders for medical vision-and-language pre-training. [Paper] [Code]
- [JBHI] Multi-modal understanding and generation for medical images and text via vision-language pre-training. [Paper] [Code]
- [ACM MM] Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge. [Paper] [Code]
- [ECCV] Making the most of text semantics to improve biomedical vision–language processing. [Paper]
- [Nature Biomedical Engineering] Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. [Paper] [Code]
- [arXiv] RoentGen: Vision-language foundation model for chest X-ray generation. [Paper]
- [arXiv] Adapting pretrained vision-language foundational models to medical imaging domains. [Paper]
- [arXiv] Medical image understanding with pretrained vision language models: A comprehensive study. [Paper]
- [EMNLP] Medclip: Contrastive learning from unpaired medical images and text. [Paper] [Code]
- [MICCAI] Breaking with fixed set pathology recognition through report-guided contrastive training. [Paper]
2021
- [arXiv] MMBERT: Multimodal BERT pretraining for improved medical VQA. [Paper] [Code]
- [ICCV] GLoRIA: A multimodal global-local representation learning framework for label-efficient medical image recognition. [Paper] [Code]
Dataset Name | Text Types | Scale | Task | Link |
---|---|---|---|---|
PubMed | Literature | 18B tokens | Language modeling | * |
MedC-I | Literature | 79.2B tokens | Dialogue | * |
Guidelines | Literature | 47K instances | Language modeling | * |
PMC-Patients | Literature | 167K instances | Information retrieval | * |
MIMIC-III | Health records | 122K instances | Language modeling | * |
MIMIC-IV | Health record | 299K instances | Language modeling | * |
eICU-CRDv2.0 | Health record | 200K instances | Language modeling | * |
EHRs | Health record | 82B tokens | Named entity recognition, Relation extraction, Semantic textual similarity, Natural language inference, Dialogue | - |
MD-HER | Health record | 96K instances | Dialogue, Question answering | - |
IMCS-21 | Dialogue | 4K instances | Dialogue | * |
Huatuo-26M | Dialogue | 26M instances | Question answering | * |
MedInstruct-52k | Dialogue | 52K instances | Dialogue | * |
MASH-QA | Dialogue | 35K instances | Dialogue | * |
MedQuAD | Dialogue | 47K instances | Dialogue | * |
MedDG | Dialogue | 17K instances | Dialogue | * |
CMExam | Dialogue | 68K instances | Dialogue | * |
cMedQA2 | Dialogue | 108K instances | Dialogue | * |
CMtMedQA | Dialogue | 70K instances | Dialogue | * |
CliCR | Dialogue | 100K instances | Dialogue | * |
webMedQA | Dialogue | 63K instances | Dialogue | * |
ChiMed | Dialogue | 1.59B tokens | Dialogue | * |
MedDialog | Dialogue | 20K instances | Dialogue | * |
CMD | Dialogue | 882K instances | Dialogue | * |
BianqueCorpus | Dialogue | 2.4M instances | Dialogue | * |
MedQA | Dialogue | 4K instances | Dialogue | * |
HealthcareMagic | Dialogue | 100K instances | Dialogue | * |
iCliniq | Dialogue | 10K instances | Dialogue | * |
CMeKG-8K | Dialogue | 8K instances | Dialogue | * |
Hybrid SFT | Dialogue | 226K instances | Dialogue | * |
VariousMedQA | Dialogue | 54K instances | Dialogue | * |
Medical Meadow | Dialogue | 160K instances | Dialogue | * |
MultiMedQA | Dialogue | 193K instances | Dialogue | - |
BiMed1.3M | Dialogue | 250K instances | Dialogue | * |
OncoGPT | Dialogue | 180K instances | Dialogue | * |
Dataset Name | Modality | Scale | Task | Link |
---|---|---|---|---|
LIMUC | Endoscopy | 1043 videos (11276 frames) | Detection | * |
SUN | Endoscopy | 1018 videos (158,690 frames) | Detection | * |
Kvasir-Capsule | Endoscopy | 117 videos (4,741,504 frames) | Detection | * |
EndoSLAM | Endoscopy | 1020 videos (158,690 frames) | Detection, Registration | * |
LDPolypVideo | Endoscopy | 263 videos (895,284 frames) | Detection | * |
HyperKvasir | Endoscopy | 374 videos (1,059,519 frames) | Detection | * |
CholecT45 | Endoscopy | 45 videos (90489 frames) | Segmentation, Detection | * |
DeepLesion | CT slices (2D) | 32,735 images | Segmentation, Registration | * |
LIDC-IDRI | 3D CT | 1,018 volumes | Segmentation | * |
TotalSegmentator | 3D CT | 1,204 volumes | Segmentation | * |
TotalSegmentatorv2 | 3D CT | 1,228 volumes | Segmentation | * |
AutoPET | 3D CT, 3D PET | 1,214 PET-CT pairs | Segmentation | * |
ULS | 3D CT | 38,842 volumes | Segmentation | * |
FLARE 2022 | 3D CT | 2,300 volumes | Segmentation | * |
FLARE 2023 | 3D CT | 4,500 volumes | Segmentation | * |
AbdomenCT-1K | 3D CT | 1,112 volumes | Segmentation | * |
CTSpine1K | 3D CT | 1,005 volumes | Segmentation | * |
CTPelvic1K | 3D CT | 1,184 volumes | Segmentation | * |
MSD | 3D CT, 3D MRI | 1,411 CT, 1,222 MRI | Segmentation | * |
BraTS21 | 3D MRI | 2,040 volumes | Segmentation | * |
BraTS2023-MEN | 3D MRI | 1,650 volumes | Segmentation | * |
ADNI | 3D MRI | - | Clinical study | * |
PPMI | 3D MRI | - | Clinical study | * |
ATLAS v2.0 | 3D MRI | 1,271 volumes | Segmentation | * |
PI-CAI | 3D MRI | 1,500 volumes | Segmentation | * |
MRNet | 3D MRI | 1,370 volumes | Segmentation | * |
Retinal OCT-C8 | 2D OCT | 24,000 volumes | Classification | * |
Ultrasound Nerve Segmentation | US | 11,143 images | Segmentation | * |
Fetal Planes | US | 12,400 images | Classification | * |
EchoNet-LVH | US | 12,000 videos | Detection, Clinical study | * |
EchoNet-Dynamic | US | 10,030 videos | Function assessment | * |
AIROGS | CFP | 113,893 images | Classification | * |
ISIC 2020 | Dermoscopy | 33,126 images | Classification | * |
LC25000 | Pathology | 25,000 images | Classification | * |
DeepLIIF | Pathology | 1,667 WSIs | Classification | * |
PAIP | Pathology | 2,457 WSIs | Segmentation | * |
TissueNet | Pathology | 1,016 WSIs | Classification | * |
NLST | 3D CT, Pathology | 26,254 CT, 451 WSIs | Clinical study | * |
CRC | Pathology | 100k images | Classification | * |
MURA | X-ray | 40,895 images | Detection | * |
ChestX-ray14 | X-ray | 112,120 images | Detection | * |
SNOW | Synthetic pathology | 20K image tiles | Segmentation | * |
Dataset Name | Modality | Scale | Task | Link |
---|---|---|---|---|
CellxGene Corpus | scRNA-seq | over 72M scRNA-seq data | Single cell omics study | * |
NCBI GenBank | DNA | 3.7B sequences | Genomics study | * |
SCP | scRNA-seq | over 40M scRNA-seq data | Single cell omics study | * |
Gencode | DNA | Genomics study | * | |
10x Genomics | scRNA-seq, DNA | Single cell omics and genomics study | * | |
ABC Atlas | scRNA-seq | over 15M scRNA-seq data | Single cell omics study | * |
Human Cell Atlas | scRNA-seq | over 50M scRNA-seq data | Single cell omics study | * |
UCSC Genome Browser | DNA | Genomics study | * | |
CPTAC | DNA, RNA, protein | - | Genomics and proteomics study | * |
Ensembl Project | Protein | Proteomics study | * | |
RNAcentral database | RNA | 36M sequences | Transcriptomics study | * |
AlphaFold DB | Protein | 214M structures | Proteomics study | * |
PDBe | Protein | Proteomics study | * | |
UniProt | Protein | over 250M sequences | Proteomics study | * |
LINCS L1000 | Small molecules | 1,000 genes with 41k small molecules | Disease research, drug response | * |
GDSC | Small molecules | 1,000 cancer cells with 400 compounds | Disease research, drug response | * |
CCLE | Bioinformatics study | * |
Dataset Name | Modalities | Scale | Task | Link |
---|---|---|---|---|
MIMIC-CXR | X-ray, Medical report | 377K images, 227K texts | Vision-Language Learning | * |
PadChest | X-ray, Medical report | 160K images, 109K texts | Vision-Language Learning | * |
CheXpert | X-ray, Medical report | 224K images, 224K texts | Vision-Language Learning | * |
ImageCLEF2018 | Multimodal, Captions | 232K images, 232K texts | Image captioning | * |
OpenPath | Pathology, Tweets | 208K images, 208K texts | Vision-Language learning | * |
PathVQA | Pathology, QA | 4K images, 32K QA pairs | VQA | * |
Quilt-1M | Pathology Images, Mixed-source text | 1M images, 1M texts | Vision-Language learning | * |
PatchGastricADC22 | Pathology, Captions | 991 WSIs, 991 texts | Image captioning | * |
PTB-XL | ECG, Medical report | 21K records, 21K texts | Vision-Language learning | * |
ROCO | Multimodal, Captions | 87K images, 87K texts | Vision-Language learning | * |
MedICaT | Multimodal, Captions | 217K images, 217K texts | Vision-Language learning | * |
PMC-OA | Multimodal, Captions | 1.6M images, 1.6M texts | Vision-Language learning | * |
ChiMed-VL | Multimodal, Medical report | 580K images, 580K texts | Vision-Language learning | * |
PMC-VQA | Multimodal, QA | 149K images, 227K QA pairs | VQA | * |
SwissProtCLAP | Protein Sequence, Text | 441K protein sequence, 441K texts | Protein-Language learning | * |
Duke Breast Cancer MRI | Genomic, MRI images, Clinical data | 922 patients | Multimodal learning | * |
I-SPY2 | MRI images, Clinical data | 719 patients | Multimodal learning | * |
Database | Discription | Link |
---|---|---|
CGGA | Chinese Glioma Genome Atlas (CGGA) database contains clinical and sequencing data of over 2,000 brain tumor samples from Chinese cohorts. | * |
UK Biobank | UK Biobank is a large-scale biomedical database and research resource containing de-identified genetic, lifestyle and health information and biological samples from half a million UK participants. | * |
TCGA | The Cancer Genome Atlas program (TCGA) molecularly characterizes over 20,000 primary cancer, matches normal samples spanning 33 cancer types, and generates over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. | * |
TCIA | The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large publicly available archive of medical images of cancer. | * |