Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 2.26 KB

datasets.md

File metadata and controls

31 lines (22 loc) · 2.26 KB

Widely Used Datasets

  • [MedTrinity-25M] MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine, Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou [Paper] [Project] [Huggingface] [Github]

  • CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz [Paper] [Code]

  • [arXiv:2405.19538] CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and Patients, Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz [Paper] [Dataset] [Model]

  • [ROCO Dataset] Pelka, Obioma, et al. "Radiology Objects in COntext (ROCO): a multimodal image dataset." Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3. Springer International Publishing, 2018. [Paper] [Github]

  • [MedICaT Dataset] Subramanian, Sanjay, et al. "MedICaT: A Dataset of Medical Images, Captions, and Textual References." Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. [Paper] [Code]