A program to find key complex patterns in SAR data
The program has been tested on Python 3.6.
If you've installed RDKit already (e.g., with Anaconda), you can install from PyPI with:
$ pip install nonadditivity
If you haven't already installed RDKit, you can either follow the instructions
on http://rdkit.org/ or simply install from PyPI using the [rdkit]
extra:
$ pip install nonadditivity[rdkit]
Install directly from source with:
$ pip install git+https://github.com/KramerChristian/NonadditivityAnalysis.git
Install the code in development mode with:
$ git clone git+https://github.com/KramerChristian/NonadditivityAnalysis.git
$ cd NonadditivityAnalysis
$ pip install -e .
If a special salt clean-up is required, the path to the salt definitions can be set on line 43.
The code runs as a simple command-line tool. Command line options are printed via
$ python -m nonadditivity -h
Using the test files supplied, an example run can be
$ python -m nonadditivity -in hERG_ChEMBL.txt -delimiter tab -series_column ASSAY_CHEMBLID -props PCHEMBL_VALUE -units nM
IDENTIFIER [sep] SMILES [sep] DATA ...
where [sep] is the separator and can be chosen from tab, space, comma, and semicolon.
If you use this code for a publication, please cite Kramer, C. Nonadditivity Analysis. J. Chem. Inf. Model. 2019, 59, 9, 4034–4042.
https://pubs.acs.org/doi/10.1021/acs.jcim.9b00631
The overall process is:
- Parse input:
- read structures
- clean and transform activity data
- remove Salts
2.) Compute MMPs
3.) Find double-transformation cycles
4.) Write to output & calculate statistics
Ideally, the compounds are already standardized when input into nonadditivity analysis. The code will not correct tautomers and charge state, but it will attempt to desalt the input.
Since Nonadditivity analysis only makes sense on normally distributed data, the input activity data can be transformed depending on the input units. You can choose from "M", "mM", "uM", "nM", "pM", and "noconv". The 'xM' units will be transformed to pActivity with the corresponding factors. 'noconv' keeps the input as is and does not do any transformation.
For multiplicate structures, only the first occurence will be kept.
Matched Pairs will be computed based on the cleaned structures. This is done by a subprocess call to the external mmpdb program. Per default, 20 parallel jobs are used for the fragmentation. This can be changed on line 681.
This is the heart of the Nonadditivity algorithm. Here, sets of four compounds that are linked by two transformations are identified. For more details about the interpretation see publication above.
Information about the compounds making up the cycles and the distribution of nonadditivity is written to output files. [...] denotes the input file name. The file named
"Additivity_diffs"[...]".txt"
contains information about the cycles and the Probability distribution
The file named
"Additivity_diffs"[...]"_perCompound.txt"
contains information about the Nonadditivity aggregated per Compound across all cycles where a given compound occurs.
The file named
"Additivity_diffs"[...]_c2c.txt
links the two files above and can be used for examnple for visualizations in SpotFire.
The NonadditivityAnalysis code is copyright 2015-2019 by F. Hoffmann-La Roche Ltd and distributed under the 3-clause BSD license (see LICENSE.txt).