This github page contains pGENMi code to associate TFs with Drugs. The original paper in which v1 was applied is published in Genome Research https://www.ncbi.nlm.nih.gov/pubmed/29898900.
Modify Makefile with your platforms specific needs. This code uses C++11. The given Makefile assumes a .obj and bin directory. Simply make all as below:
make all
This should produce pgenmi which is the main program that produces output on the likelihoods for various models, by default two: one with all (H_ALL or H_FULL or H1) features and one with only the intercept (H0 or H_0 or H_NULL or H_NULL_X0).
It should also produce postodds which is a program that, given model strings and data file, outputs posterior probabilities for each gene's latent variable being 1 contingent on the model parameters and evidences and pvalues. It should also produce posterior odds, which are simply the ratio of two seperate models' posterior on a given gene. The rest of the README.md will talk about pgenmi, but it will be updated in the future to include postodds.
The format of the data to pgenmi is TSV. It includes a header like below:
GENE PVAL A0 A1 A2
GENE_A 0.01 1 1 0
GENE_B 0.30 1 0 1
GENE_C 0.04 1 1 1
...
GENE_D 1.00 1 0 0
All datalines should begin with a symbol in the first column, p-value in the second column, a 1 for the third column, and then the respective regulatory evidences for the remaining columns.
Regulatory evidences need not be binary - they can be continuous. However: C++ pGENMi was tested using binary feature data and performance on continuous data may be unexpected
The pgenmi program accepts three arguments: the path to the file with the aforementioned input data, a drug label, and a tf label
./pgenmi OPTIONS /path/to/input/file
The output is in the following format where , are replaced by tabs. Each model is a row in the file. The loglikelihood of the learned model and its initialization likelihood, along with parameters, are printed.
TF, DRUG, MODELNAME, NUM_FEATURES, MODEL BITMASK, INITAL_MODEL, TRAINED_MODEL
A model is printed as follows:
MODEL_ITER, MODEL_LOGLIK, MODEL_LOGLIK_PREV, MODEL_WEIGHTS, MODEL_ALPHA
Here MODEL is either trained or initial. Thus, each row has both initial and trained parameters and performance.
What distinguishes rows from one another are bitmasks, which are the length of the weight vectors. These bitmasks remove certain parameters from the model. The first bit in a bitmask, corresponding to whether or not to include the intercept term of the model, is always 1.
-a, --all_disable Disable both H_ALL and H_NULL model runs.
-d, --drug Drug name (Default is DRUG).
-f, --full_disable Disable full (H1) model.
-h, --help Print this usage guide.
-n, --null_disable Disable null (H0) model.
-r, --restarts=NUM Restart model runs NUM times.
-s, --singles Run all models with only one feature
and intercept active.
-t, --tf TF name (Default is TF).
-w, --without_singles Run all models missing only one feature
excluding the intercept.
-i, --names="CSV" A CSV of model names corresponding
to and in the order of --models.
If --models unset, this is ignored.
Default model names generated if there
more models than names, for the extra
models without names.
The CSV should be in double quotes.
Example: --names="H2,H3".
-m, --models="CSV" Run models on CSV feature lists.
Features are column indices in FILe
using 1 based indexing.
A model is a comma seperated value
(CSV) list of feature indices.
To specify multiple models, seperate
individual CSV models by '|'.
The entire INDICES list should be in
double quotes.
Example: --models="1,2|3"
This says run two models:
The 1st on features 1 and 2.
The 2nd on features 1 and 3.
Note that the 2nd list omitted 1.
But, feature 1 (intercept) is
auto included when omitted.
The TensorFlow directory contains a pgenmi.py file that behaves nearly identically to C++ pgenmi; however, it leverages built in optimizers that can be modified and customized. This of course presumes that your machine has python3 and TensorFlow installed.
The TensorFlow/example directory contains a NOTES file on how to run TensorFlow pGENMi properly. It follows a similar structure to the C++ version.
python3 pgenmi.py --input_file /path/to/input/file
Getting the output in the same format as the C++ version will require some post-processing manipulation. Additionally, the null hypothesis, H0, is not tested within each run. Instead, a file for H0 must be created and run in pgenmi.py. The example directory contains example files on this.