The primary goal of BioMapAI is to connect high-dimensional biology data,
BioMapAI is a deep learning framework for multi-stage modeling of biological data. It first predicts intermediate targets (omics scores) and then maps them into a final outcome or classification label.
- OmicScoreModel: Train a model to predict intermediate omic scores (Y).
- ScoreLayer: Build a simple layer (or sub-model) that converts omic scores (Y) into the final target (y0).
- ScoreYModel: Combine the trained omic score model with the ScoreLayer to get final predictions and metrics.
- WeightsAdjust (Optional): Fine-tune the relationship between Y and y0 for better performance.
-
Install Dependencies
pip install numpy pandas tensorflow
-
Clone or Download This Repository
This repository should include:
BioMapAI.py
: Contains the classes and methods (OmicScoreModel, ScoreLayer, etc.).example_data/
: Folder containingtrain_data.csv
andtest_data.csv
.BioMapAI_Training_Tutorial.ipynb
: Detailed notebook tutorial.
-
Run the Tutorial Notebook
- Open
OmicScoreModel_Tutorial.ipynb
in Jupyter Notebook or JupyterLab. - Follow the cells step-by-step to:
- Load training and test data.
- Train the OmicScoreModel to predict intermediate scores (
Y
). - Build a ScoreLayer to convert those scores into final predictions (
y0
). - Evaluate the model performance on a test set.
- (Optional) Adjust weights to improve performance.
- Open
-
Customize or Extend
- Tune hyperparameters (epochs, optimizer, batch size, etc.).
- Add or remove features in the data CSV files.
- Modify
BioMapAI.py
to create custom network architectures or loss functions. - Integrate advanced data preprocessing or feature engineering techniques.
For an in-depth guide, check out the BioMapAI_Training_Tutorial.ipynb. It covers:
- Data loading and organization
- Model instantiation and training procedures
- How to evaluate intermediate and final predictions
- Strategies for adjusting the model to improve performance
We have used BioMapAI to build pretrained models specifically for ME/CFS omics data, called DeepMECFS.We trained BioMapAI on gut microbiome data (species abundance and KEGG gene abundance), plasma metabolome, high-throughput immune flow cytometry data, Quest lab measurements, and a combined omics file containing key features from all datasets. These models are located in the folder pretrained_model_DeepMECFS/
and can be applied directly to new ME/CFS datasets. Here we use one of public metabolome datasets as an example to walk through how to load and use our pretrained models.
DeepMECFS_metabolome/
: Directory containing the trained TensorFlow model.Y2y_metabolome/
: Secondary model for converting intermediate features (Y
) into final ME/CFS classification.metabolome_feature_metadata.csv
: Required features and metadata for alignment with your dataset.
- Install/Clone the repository containing the
pretrained_model_DeepMECFS/
folder. - Prepare Your Data:
- Ensure your metabolomics data columns match the names (or COMP_IDs) in
metabolome_feature_metadata.csv
. - Scale or normalize your data consistently (e.g., via
StandardScaler
).
- Ensure your metabolomics data columns match the names (or COMP_IDs) in
- Run the Tutorial:
- Open
DeepMECFS_Tutorial.ipynb
(or equivalent notebook/script). - Follow each step to:
- Load the pretrained models (
DeepMECFS_metabolome/
andY2y_metabolome/
). - Align your dataset columns to the model’s expected features.
- Generate predictions (ME/CFS vs. Control).
- Evaluate performance metrics (accuracy, AUC, precision, etc.).
- Load the pretrained models (
- Open
- Interpret Results:
- The model outputs a probability (
0 to 1
) for ME/CFS classification. - You can threshold this probability (e.g., 0.5) to get a binary label (
CFS
vs.Control
).
- The model outputs a probability (
- Explore Further:
- You can experiment with different preprocessing or consider re-training parts of the pipeline if your data differs significantly from the original study.
The metabolomics data used to train DeepMECFS is described in:
Arnaud Germain, et al. “Plasma metabolomics reveals disrupted response and recovery following maximal exercise in myalgic encephalomyelitis/chronic fatigue syndrome.” JCI Insight. 2022;7(9):e157621.
DOI: 10.1172/jci.insight.157621
For detailed instructions, see the Pretrained_DeepMECFS_Tutorial.ipynb. It includes code snippets for loading the data, aligning it to the model’s features, and running inference.
This project is provided under the MIT License (or whichever license you choose). Feel free to modify or redistribute under its terms.