Skip to content

MIT-LCP/vector-embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vector Embedding Project for Medical Data

Overview

This repository provides comprehensive pipelines for generating vector embeddings from MIMIC III and MIMIC IV datasets, as well as related healthcare datasets. The primary focus is on creating high-quality vector representations that can be used for downstream machine learning tasks in healthcare AI research.

Overview

Key Features

  • Multi-dataset Support: Full compatibility with MIMIC III, MIMIC IV, and related healthcare datasets
  • State-of-the-art Models: Integration with advanced embedding models.

Data Access and Setup

MIMIC Dataset Access

  1. Register for PhysioNet Access:

    • Create an account at PhysioNet
    • Complete the required training for human subjects research
    • Request access to MIMIC III and/or MIMIC IV datasets
  2. Download Datasets:

    # Example for MIMIC IV version 3.1
    wget -r -N -c -np --user YOUR_USERNAME --ask-password \
      https://physionet.org/files/mimiciv/3.1/

CXR Model Information

CheXagent

CheXFound

EVA-X

CXR Foundation Model (ELIXR)

MedSigLIP

TorchXRayVision

ECHO Model Information

EchoPrime

R3D-Transformer

PanEcho

ECG Model Information

HuBERT-ECG

ECGFM-KED

ECGFounder

PPG Model Information

PaPaGei

Related Publications

  • Johnson, A. E. W., et al. "MIMIC-IV, a freely accessible electronic health record dataset." Scientific Data 10.1 (2023): 1.
  • Goldberger, A. L., et al. "PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals." Circulation 101.23 (2000): e215-e220.
  • Tohyama T, et al. Multi-view echocardiographic embedding for accessible AI development. medRxiv. 2025. doi:10.1101/2025.08.15.25333725.
  • Chung DJ, et al. Echocardiogram Vector Embeddings Via R3D Transformer for the Advancement of Automated Echocardiography. JACC Adv 2024;3:101196.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Embedding Library Availability

Support and Contact

Organizations

PhysioNet MIT Critical Data

Acknowledgments

  • Funding: This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: RS-2024-00439677)

Disclaimer: This software is provided for research purposes only. It is not intended for clinical use. Always comply with your institution's ethics and data usage policies when working with healthcare data.

About

Vector Embedding Projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •