-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
73 lines (53 loc) · 4.62 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
This repo contains the Go source code to run the method described in "Quantifying the similarity of topological domains across normal and cancer human cell types" (Sauerwald and Kingsford), as well as a Python script to reproduce all statistics and figures used in the paper.
EVALUATING TAD SET SIMILARITY
~/localtadsim/go$ ./localdiff -h
Usage of ./localdiff:
-gamma string
if optimizing gamma, use 'opt,n' where n is the median TAD size to optimize for
-o string
output filename
-res int
resolution of Hi-C data (default 1)
-tad string
comma-separated list of two TAD filenames or file patterns if optimizing gamma
We will go through an example, using the files provided in the go/example folder. These example TAD sets were generated by the open source Armatus software. In order to compare these two TAD sets, run the following command:
localtadsim/go$ time ./localdiff -tad=example/EncodeChr18_A549_combo_IC_100kb_gmax1.0.gamma.0.8.0.txt,example/RaoChr18_KBM7_IC_100kb_gmax1.0.gamma.0.8.0.txt -res=100000 -o=example/tadcomp_A549_KBM7_chr18.txt
Wrote output values to example/tadcomp_A549_KBM7_chr18.txt
real 0m1.802s
user 0m2.084s
sys 0m0.108s
The output file will contain any intervals of significant similarity between these two TAD sets, along with the VI value of this interval and its p-value. This example does not use the -gamma flag, but if you would like to use the gamma that produces a particular median TAD length (ie 880kb, as in the paper), use "-gamma=opt,8.8". Note that the median TAD length given should already be divided by the resolution of the data. In addition, in this case the filenames input to -tad should be filepaths to where the files containing all gamma options are found. For example, "-tad=armatusresults/A549/100kb/EncodeChr18_A549_combo_IC_100kb_gmax1.0,armatusresults/K562/100kb/RaoChr18_KBM7_IC_100kb_gmax1.0", where we get:
$ ls armatusresults/A549/100kb/EncodeChr18_A549_combo_IC_100kb_gmax1.0*
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.consensus.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.1.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.2.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.3.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.4.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.5.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.6.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.7.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.8.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.9.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.1.0.txt
If you are not using Armatus files as input TAD files, note that the format of the files must be the same, namely that your file contains at least 3 columns, where the 2nd column gives TAD start locations and the 3rd gives TAD end locations.
ANALYZING RESULTS OF LARGE-SCALE COMPARISON
In order to reproduce the statistics and figures from the paper, simply run the Python script analyzelocaldiffresults.py, with the path to all results files from running the above code on all comparisons.
localtadsim$ python analyzelocaldiffresults.py -h
usage: analyzelocaldiffresults.py [-h] [-i I] [-r R] [-a A [A ...]] [-cl CL]
[-cm CM] [-gl GL] [-p P] [-c C [C ...]]
[-n N [N ...]] [-o O]
optional arguments:
-h, --help show this help message and exit
-i I fileseed of files to analyze, with wildcard characters
-r R resolution of Hi-C data
-a A [A ...] Armatus files for plotting TADs
-cl CL File containing chromosome lengths
-cm CM File containing centromere locations
-gl GL File listing all human gene locations
-p P File containing points to plot in matrix (like in Fig 1)
-c C [C ...] Cancer cell types
-n N [N ...] Normal cell types
-o O Path to location for output files/figures to be written
Many of these arguments are optional. Based on our file structure and cell type names, we ran the following command:
localtadsim$ python analyzelocaldiffresults.py -i 'go/outputs/tadcomp_armatus*' -r 100000 -c K562_R K562_LA KBM7 A549 Caki2 G401 LNCaP-FGC NCI-H460 Panc1 RPMI-7951 SJCRH30 SKMEL5 SKNDZ SKNMC T47D -n hESC IMR90_D IMR90_R GM06990 GM12878 HMEC HUVEC NHEK -o go/outputs/figures/tadcomp_results_