Skip to content

Kingsford-Group/localtadsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo contains the Go source code to run the method described in "Quantifying the similarity of topological domains across normal and cancer human cell types" (Sauerwald and Kingsford), as well as a Python script to reproduce all statistics and figures used in the paper.


EVALUATING TAD SET SIMILARITY

~/localtadsim/go$ ./localdiff -h
Usage of ./localdiff:
  -gamma string
    	 if optimizing gamma, use 'opt,n' where n is the median TAD size to optimize for
  -o string
     output filename
  -res int
       resolution of Hi-C data (default 1)
  -tad string
       comma-separated list of two TAD filenames or file patterns if optimizing gamma


We will go through an example, using the files provided in the go/example folder. These example TAD sets were generated by the open source Armatus software. In order to compare these two TAD sets, run the following command:

localtadsim/go$ time ./localdiff -tad=example/EncodeChr18_A549_combo_IC_100kb_gmax1.0.gamma.0.8.0.txt,example/RaoChr18_KBM7_IC_100kb_gmax1.0.gamma.0.8.0.txt -res=100000 -o=example/tadcomp_A549_KBM7_chr18.txt

Wrote output values to example/tadcomp_A549_KBM7_chr18.txt

real  0m1.802s
user  0m2.084s
sys   0m0.108s

The output file will contain any intervals of significant similarity between these two TAD sets, along with the VI value of this interval and its p-value. This example does not use the -gamma flag, but if you would like to use the gamma that produces a particular median TAD length (ie 880kb, as in the paper), use "-gamma=opt,8.8". Note that the median TAD length given should already be divided by the resolution of the data. In addition, in this case the filenames input to -tad should be filepaths to where the files containing all gamma options are found. For example, "-tad=armatusresults/A549/100kb/EncodeChr18_A549_combo_IC_100kb_gmax1.0,armatusresults/K562/100kb/RaoChr18_KBM7_IC_100kb_gmax1.0", where we get:

$ ls armatusresults/A549/100kb/EncodeChr18_A549_combo_IC_100kb_gmax1.0*
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.consensus.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.1.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.2.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.3.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.4.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.5.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.6.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.7.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.8.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.0.9.0.txt
armatusresults/A549/100kb/EncodeChr22_A549_combo_IC_100kb_gmax1.0.gamma.1.0.txt

If you are not using Armatus files as input TAD files, note that the format of the files must be the same, namely that your file contains at least 3 columns, where the 2nd column gives TAD start locations and the 3rd gives TAD end locations.


ANALYZING RESULTS OF LARGE-SCALE COMPARISON

In order to reproduce the statistics and figures from the paper, simply run the Python script analyzelocaldiffresults.py, with the path to all results files from running the above code on all comparisons.

localtadsim$ python analyzelocaldiffresults.py -h
usage: analyzelocaldiffresults.py [-h] [-i I] [-r R] [-a A [A ...]] [-cl CL]
                                  [-cm CM] [-gl GL] [-p P] [-c C [C ...]]
                                  [-n N [N ...]] [-o O]

optional arguments:
  -h, --help    show this help message and exit
  -i I          fileseed of files to analyze, with wildcard characters
  -r R          resolution of Hi-C data
  -a A [A ...]  Armatus files for plotting TADs
  -cl CL        File containing chromosome lengths
  -cm CM        File containing centromere locations
  -gl GL        File listing all human gene locations
  -p P          File containing points to plot in matrix (like in Fig 1)
  -c C [C ...]  Cancer cell types
  -n N [N ...]  Normal cell types
  -o O          Path to location for output files/figures to be written

Many of these arguments are optional. Based on our file structure and cell type names, we ran the following command:

localtadsim$ python analyzelocaldiffresults.py -i 'go/outputs/tadcomp_armatus*' -r 100000 -c K562_R K562_LA KBM7 A549 Caki2 G401 LNCaP-FGC NCI-H460 Panc1 RPMI-7951 SJCRH30 SKMEL5 SKNDZ SKNMC T47D -n hESC IMR90_D IMR90_R GM06990 GM12878 HMEC HUVEC NHEK -o go/outputs/figures/tadcomp_results_

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published