Skip to content

Part of the contribution to Ersilia Model Hub for the Outreachy 2024 internship

License

Notifications You must be signed in to change notification settings

Malikbadmus/model-validation-eos30f3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model-Validation-eos30f3

🏚️ Structure

This repository is organised by folders:

  • Data: Contains the raw data, processed data and Model predictions.
  • Figures: Contains a collection of visualizations presented in PNG format..
  • Notebooks: Houses the jupyter notebook files where most of the developmen took place.
  • Src: Contains important functions I will re-use throughout the repository, to avoid typing them each time.

Model Abstract

This model leverages the ChemProp network (D-MPNN, see original Stokes et al, Cell, 2020 for more information) to build a predictor of hERG-mediated cardiotoxicity. The model has been trained using a dataset published by Cai et al, J Chem Inf Model, 2019, which contains 7889 molecules with several cut-offs for hERG blocking activity. The authors select a 10 uM cut-off. This implementation of the model does not use any specific featurizer, though the authors suggest the moe206 descriptors (closed-source) improve performance even further.

Model Characteristics

  • Input: Compound
  • Input Shape: Single
  • Task: Classification
  • Output: Score
  • Output Type: Float
  • Output Shape: Single
  • Interpretation: Probability of blocking hERG (cut-off: 10uM)

Installation Environment

Check here

Getting Started

  1. Install Ersilia Model Hub

    • Follow this step to install the Hub.
  2. Fetch the Model and Install it locally

    ersilia -v fetch eos30f3
    
  3. Serve the Model

    ersilia -v serve eos30f3
    
  4. Make predictions with the Processed Datasets here

    ersilia -v api run -i data/Processed/1000_Molecules.csv -o data/Model_predictions/eos30f3_output.csv
    
    

Data Procurement Process

  1. ChEMBL Data Procurement:

    • A Datasets of 3592 Small Molecules Compounds that has been aproved for Use was Downloaded in a CSV format, and stored in /data/Raw.
  2. DMPNN-hERG Datasets Procurement:

    • The Model was trained on 7889 compounds with well-defined experimental data on the hERG and with diverse chemical structures assembled by Cai et al in their work published in J Chem Inf Model, 2019 here, it provides target variables with various cutoffs for hERG blocking activity.
  3. External Datasets Procurement:

    • This hERG data set provides 648 drugs with binary labels, it was sourced from Therapeutics Data Commons, which is a collection of datasets for drug discovery.

References

License

All the code in this repository is licensed under a GPLv3 License.

About

Part of the contribution to Ersilia Model Hub for the Outreachy 2024 internship

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published