This repository is organised by folders:
- Data: Contains the raw data, processed data and Model predictions.
- Figures: Contains a collection of visualizations presented in PNG format..
- Notebooks: Houses the jupyter notebook files where most of the developmen took place.
- Src: Contains important functions I will re-use throughout the repository, to avoid typing them each time.
This model leverages the ChemProp network (D-MPNN, see original Stokes et al, Cell, 2020 for more information) to build a predictor of hERG-mediated cardiotoxicity. The model has been trained using a dataset published by Cai et al, J Chem Inf Model, 2019, which contains 7889 molecules with several cut-offs for hERG blocking activity. The authors select a 10 uM cut-off. This implementation of the model does not use any specific featurizer, though the authors suggest the moe206 descriptors (closed-source) improve performance even further.
- Input:
Compound
- Input Shape:
Single
- Task:
Classification
- Output:
Score
- Output Type:
Float
- Output Shape:
Single
- Interpretation: Probability of blocking hERG (cut-off: 10uM)
Check here
-
Install Ersilia Model Hub
- Follow this step to install the Hub.
-
Fetch the Model and Install it locally
ersilia -v fetch eos30f3
-
Serve the Model
ersilia -v serve eos30f3
-
Make predictions with the Processed Datasets here
ersilia -v api run -i data/Processed/1000_Molecules.csv -o data/Model_predictions/eos30f3_output.csv
-
ChEMBL Data Procurement:
- A Datasets of
3592
Small Molecules Compounds that has been aproved for Use was Downloaded in a CSV format, and stored in /data/Raw.
- A Datasets of
-
DMPNN-hERG Datasets Procurement:
- The Model was trained on
7889
compounds with well-defined experimental data on the hERG and with diverse chemical structures assembled by Cai et al in their work published in J Chem Inf Model, 2019 here, it provides target variables with various cutoffs for hERG blocking activity.
- The Model was trained on
-
External Datasets Procurement:
-
This hERG data set provides 648 drugs with binary labels, it was sourced from Therapeutics Data Commons, which is a collection of datasets for drug discovery.
-
All the code in this repository is licensed under a GPLv3 License.