RINDTI

This repository aims to simplify the drug-target interaction prediction process which is based on protein residue interaction networks (RINs)

Overview

The repository aims to go from a simple collections of inputs - structures of proteins, interactions data on drugs to a fully-function GNN model

Installation

clone the repository with git clone https://github.com/ilsenatorov/rindti
change in the root directory with cd rindti
(Optional) install mamba with conda install -n base -c conda-forge mamba
create the conda environment with mamba env create -f workflow/envs/main.yml (might take some time)
activate the environment with conda activate rindti
clone rinerator repository into your home dir with git clone --branch perf https://wibi-git.helmholtz-hzi.de/ske18/rinerator/ ~/rinerator
install rinerator with pip install ~/rinerator

Snakemake pipeline

Snakemake pipeline (located in workflow directory) is responsible for parsing all the data. Usually the result is a single pickle file containing all the necessary information for training, testing etc. The pipeline relies on the resources folder which should contain resources/structures/ folder with pdb files for acquiring protein networks. For DTI prediction the model also needs csv files in resources/drugs, more info will be provided later. The pipeline is controlled by a config file config/snakemake.yaml which can be modified to change the output of the pipeline

DTI data

In order to obtain the pickle file necessary for training the DTI model please ensure the only_proteins entry in the yaml config file is set to false. Once pipeline is run with snakemake -j <num_cores> --use-conda, it will generate a pickle file in the results/prepare_all/ directory.

Pretraining data

In order to omit the drug data, one can set the only_proteins to true in the config file. Running the pipeline with snakemake -j <num_cores> --use-conda will generate a pickle file in results/prepare_proteins/ directory.

Since snakemake sometimes struggles with calculating DAG for really large pipelines, steps of this pipeline could be ran manually

For rinerator-based networks please run the rinerator on all the necessary pdb files, then use the workflow/scripts/prepare_proteins.py script to generate the necessary pickle file.
For distance-based networks please run the workflow/scripts/distance_based.py script to generate the necessary pickle file.

Information on both scripts can be found through workflow/scripts/distance_based.py --help or workflow/scripts/prepare_proteins.py --help

GNN model

The GNN model is shipped as a package, located in the rindti directory.

Additionally, scripts that give an example of model training are provided (train.py and pretrain.py), which can be used as plug-and-play, or as inspiration to create custom training approaches.

Training

A model can be trained with a simple train.py results/prepare_all/<pickled_data>.pkl, assuming pipeline has been ran in advance.

Three types of models are currently available - class, reg and noisy nodes model.

Both class and noisy nodes require the final label to be a bool value (0 or 1), however noisy nodes introduces corruption of nodes and auxiliary loss of predicting the original labels of such nodes.

Pretraining

In order to pretrain the model one can use the pretrain.py script, which has three types of self-supervised pretraining models available

GraphLog and Infograph can be used directly with the result of prepare_proteins step of the workflow (located in the results/prepare_proteins directory).

Pfam siamese network will require adding the information on the protein family to that dataframe in order to function.

Name		Name	Last commit message	Last commit date
Latest commit History 338 Commits
.github/workflows		.github/workflows
config		config
dash		dash
docs		docs
rindti		rindti
tests		tests
workflow		workflow
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
README.md		README.md
embedding.ipynb		embedding.ipynb
hyperparameter.py		hyperparameter.py
interactive_losses.ipynb		interactive_losses.ipynb
parse_glass.ipynb		parse_glass.ipynb
pfam.ipynb		pfam.ipynb
pretrain.py		pretrain.py
pyproject.toml		pyproject.toml
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RINDTI

Overview

Installation

Snakemake pipeline

DTI data

Pretraining data

GNN model

Training

Pretraining

About

Releases

Packages

Languages

kalininalab/rindti

Folders and files

Latest commit

History

Repository files navigation

RINDTI

Overview

Installation

Snakemake pipeline

DTI data

Pretraining data

GNN model

Training

Pretraining

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages