This pipeline has been developed to produce tractability data for list of input Ensembl Gene IDs. This implementation is based on the public version of the GSK tractability pipeline, published here
The pipeline produces a TSV file with one target per row. Small molecule tractability buckets are denoted with "Bucket_X", antibody buckets with "Bucket_X_ab" and PROTAC buckets with "Bucket_X_PROTAC".
In addition to PROTAC tractability buckets, there is an additional "PROTAC_location_Bucket", which allows you to assess whether a target's location is suitable for the PROTAC approach.
- High confidence good location
- Med confidence good location
- High confidence grey location
- Med condifence grey location
- Unknown location
- Med confidence bad location
- High confidence bad location
Change to the directory containing this file
pip install .
Install cxoracle
Set the following environment variables:
CHEMBL_DB=oracle://address:to@local.chembl
CHEMBL_VERSION=25
Run the pipeline with the following command:
run-ot-pipeline genes.csv
Where genes.csv
is a file with one Ensembl Gene ID per line with no headers