GitHub - HormozdiariLab/TAD-fusion-score

What is TAD-fusion score?

TAD-fusion score is a score to quantify deletions based on their potential disruption of the 3D genome structure. More specifically, TAD-fusion score is defined as the expected number of additional genomic interactions created as a result of the deletion.

How to calculate the TAD-fusion scores of a deletion set?

Option 1: From 5kb Hi-C data of GM12878 of Rao et al

Compile the TAD-fusion score tool by running the script
```
./compile_cal_tad_fusion_score.sh
```
Prepare the input deletion file (3-column format as the sample file, with hg19 as the reference)

Run the tool with default parameters as

./../src/cal_tad_fusion_score -md ../Model/GM_Rao_5kb -f ../Data/disease_del.dat -mnl 10000 -mxl 5000000 -w 100 -d 0.06 -o Output/disease_del_TAD_fusion_score.dat

The output file is "Output/disease_del_TAD_fusion_score.dat", the last column is the TAD-fusion score

Sample scripts are in the folder Examples, here are options for calculating the TAD-fusion score

-md       Model directory, model files must be renamed as chr1.model, chr2.model, ..., chrX.model 
-f        The file that stores deletions that we need to calculate the TAD-fusion score, the file format has three columns (e.g. one row is "chr2    221278232       223014332")
-mnl      The minimum length (a number of base pairs), any deletion that is shorter than this threshold will be skipped
-mxl      The maximum length, any deletion that is longer than this threshold will be skipped
-w        The window length (a number of bins) around the deletion to calculate the TAD-fusion score
-d        The delta value threshold to consider if a bin pair is interacted or not.
-o        The output file, the file format has four columns where the last one is the TAD-fusion score

Option 2: From a new Hi-C dataset

Fit the model with Hi-C data

a. Install CPLEX
b. Set variables CPLEX_INCLUDE and CPLEX_LIB (in file make_fit_hic_model) to the directory where CPLEX is installed
c. Compile the source by running the script
```
./compile_fit_hic_model.sh
```
d. If the compilation is successful, an executable file "fit_hic_model" will be generated in the folder "src"

e. Options for fitting the model

 -fn       Data file path
 -ff       Data file format ("full_matrix_format" of Schmitt et al. data or "sparse_matrix_format" of Rao et al. data)
 -res      Hi-C matrix bin resolution (e.g. 40kb, 10kb, 5kb)
 -mn       Minimum distance (by a number of bins), any bin pair that the distance is shorter than this threshold will not be considered for fitting the model
 -mx       Maximum distance, any bin pair that the distance is longer than this threshold will not be considered for fitting the model
 -method   Method for fitting ("full" to fit the model from the whole Hi-C data at one time or "segmentation" to partition the matrix into segments and then fit the model for each segment)
 -sg       Length (i.e. a number of bins) of a segment (in the case the method is set to "segmentation")
 -mso      The minimum overlap (i.e. a number of bins) between two segments (in the case the method is set to "segmentation")
 -zero     A constant to replace the zero value to take the log
 -of       The output model file (the file format has 4 columns where alpha, beta, and the insulation are 1st, 3rd, 4th column respectively)

f. Example: The script file "fitting.sh" (in folder "Examples") is to fit the model of chr22 of GM12878 from Schmitt et al. data
- Run the script by
```
cd Examples
./fitting.sh
```
- The output model file is "GM12878.40kb.chr22.model" in folder "Examples/Output".
- In the model file, 1st, 2nd and 4th columns are alpha, beta, and the insulator respectively.
g. For your convenience, we also provide models (in the folder "Model") that we fitted for GM12878 from Rao et al. data at 5kb resolution.

Run TAD-fusion score tool (with the new model) to get the TAD-fusion score (as the section above)

Support

If you have any questions about TAD-fusion score, please contact Linh Huynh (huynh@ucdavis.edu) or Fereydoun Hormozdiari (fhormozd@ucdavis.edu).

Citation

Huynh L, Hormozdiari F. TAD-fusion score: discovery and ranking the contribution of deletions to genome structure. Genome Biology. 2019; 20:60.

Licence

See the LICENSE file for license rights and limitations (BSD-2).

Acknowledgement

This work is supported in part by the Sloan Research Fellowship number G-2017-9159 to Fereydoun Hormozdiari.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is TAD-fusion score?

How to calculate the TAD-fusion scores of a deletion set?

Option 1: From 5kb Hi-C data of GM12878 of Rao et al

Option 2: From a new Hi-C dataset

Support

Citation

Licence

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Data		Data
Examples		Examples
Model/GM_Rao_5kb		Model/GM_Rao_5kb
src		src
LICENSE.txt		LICENSE.txt
README.md		README.md
_config.yml		_config.yml
compile_cal_tad_fusion_score.sh		compile_cal_tad_fusion_score.sh
compile_fit_hic_model.sh		compile_fit_hic_model.sh

License

HormozdiariLab/TAD-fusion-score

Folders and files

Latest commit

History

Repository files navigation

What is TAD-fusion score?

How to calculate the TAD-fusion scores of a deletion set?

Option 1: From 5kb Hi-C data of GM12878 of Rao et al

Option 2: From a new Hi-C dataset

Support

Citation

Licence

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages