-
Notifications
You must be signed in to change notification settings - Fork 3
Kingsford-Group/localtadsim
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repo contains the Go source code to run the method described in "Quantifying the similarity of topological domains across normal and cancer human cell types" (Sauerwald and Kingsford), as well as a Python script to reproduce all statistics and figures used in the paper. EVALUATING TAD SET SIMILARITY ~/localtadsim/go$ ./localdiff -h Usage of ./localdiff: -gamma string if optimizing gamma, use 'opt,n' where n is the median TAD size to optimize for -o string output filename -res int resolution of Hi-C data (default 1) -tad string comma-separated list of two TAD filenames or file patterns if optimizing gamma We will go through an example, using the files provided in the go/example folder. These example TAD sets were generated by the open source Armatus software. In order to compare these two TAD sets, run the following command: localtadsim/go$ time ./localdiff -tad=example/EncodeChr18_A549_combo_IC_100kb_gmax1.0.gamma.0.8.0.txt,example/RaoChr18_KBM7_IC_100kb_gmax1.0.gamma.0.8.0.txt -res=100000 -o=example/tadcomp_A549_KBM7_chr18.txt Wrote output values to example/tadcomp_A549_KBM7_chr18.txt real 0m1.802s user 0m2.084s sys 0m0.108s The output file will contain any intervals of significant similarity between these two TAD sets, along with the VI value of this interval and its p-value. This example does not use the -gamma flag, but if you would like to use the gamma that produces a particular median TAD length (ie 880kb, as in the paper), use "-gamma=opt,8.8". Note that the median TAD length given should already be divided by the resolution of the data. In addition, in this case the filenames input to -tad should be filepaths to where the files containing all gamma options are found. For example, "-tad=armatusresults/A549/100kb/EncodeChr18_A549_combo_IC_100kb_gmax1.0,armatusresults/K562/100kb/RaoChr18_KBM7_IC_100kb_gmax1.0", where we get: $ ls armatusresults/A549/100kb/EncodeChr18_A549_combo_IC_100kb_gmax1.0* armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.consensus.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.1.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.2.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.3.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.4.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.5.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.6.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.7.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.8.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.9.0.txt armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.1.0.txt If you are not using Armatus files as input TAD files, note that the format of the files must be the same, namely that your file contains at least 3 columns, where the 2nd column gives TAD start locations and the 3rd gives TAD end locations. ANALYZING RESULTS OF LARGE-SCALE COMPARISON In order to reproduce the statistics and figures from the paper, simply run the Python script analyzelocaldiffresults.py, with the path to all results files from running the above code on all comparisons. localtadsim$ python analyzelocaldiffresults.py -h usage: analyzelocaldiffresults.py [-h] [-i I] [-r R] [-a A [A ...]] [-cl CL] [-cm CM] [-gl GL] [-p P] [-c C [C ...]] [-n N [N ...]] [-o O] optional arguments: -h, --help show this help message and exit -i I fileseed of files to analyze, with wildcard characters -r R resolution of Hi-C data -a A [A ...] Armatus files for plotting TADs -cl CL File containing chromosome lengths -cm CM File containing centromere locations -gl GL File listing all human gene locations -p P File containing points to plot in matrix (like in Fig 1) -c C [C ...] Cancer cell types -n N [N ...] Normal cell types -o O Path to location for output files/figures to be written Many of these arguments are optional. Based on our file structure and cell type names, we ran the following command: localtadsim$ python analyzelocaldiffresults.py -i 'go/outputs/tadcomp_armatus*' -r 100000 -c K562_R K562_LA KBM7 A549 Caki2 G401 LNCaP-FGC NCI-H460 Panc1 RPMI-7951 SJCRH30 SKMEL5 SKNDZ SKNMC T47D -n hESC IMR90_D IMR90_R GM06990 GM12878 HMEC HUVEC NHEK -o go/outputs/figures/tadcomp_results_
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published