InterTADs is an open-source tool written in R, for integrating multi-omics data (e.g. DNA methylation, expression, mutation) from the same physical source (e.g. patient) taking into account the chromatin configuration of the genome, i.e. the topologically associating domains (TADs).
You can simply clone the repository by using git:
git clone https://github.com/nikopech/InterTADs
Before running any scripts, make sure the following packages are installed in your machine:
install.packages(c("data.table", "tidyverse", "gplots", "png", "gghalves"))
devtools::install_github("stephenturner/annotables")
...and from Bioconductor:
BiocManager::install(c("TxDb.Hsapiens.UCSC.hg19.knownGene", "TxDb.Hsapiens.UCSC.hg38.knownGene", "GenomicRanges", "org.Hs.eg.db", "systemPipeR", "karyoploteR"))
There are three main scripts for integrating your multi-omics data:
Data_Integration.R
TADiff.R
Visualization.R
For the Data Integration part, all datasets are separated into two folders, freq
and counts
, based on the information they are carrying (frequency or score count values).
The two folders are placed into a directory, along with a meta-data file which provides information about the mapping between the columns for each dataset. For more details regarding the structure of this file please see here.
The script allows the user to define different folder (or file) names. Moreover, the user can choose a folder name for the output table and a option about the Human Genome that is being used (accepted values are hg19
or hg38
).
Once every input is provided, the script can be run by:
source("Data_Integration.R")
For the TADiff part, the paths to the input and output folders must be provided. Also a BED file is needed containing information about the TADs. In order to run the script:
source("TADiff.R")
For the visualization of the results, the paths to input and output data need to be provided:
source("Visualization.R")
The proposed method was evaluated on data from Chronic lymphocytic leukemia (DNA methylation and expression values). The datasets have been deposited in the ArrayExpress database at EMBL‐EBI under the accession numbers E‐MTAB‐6955 and E‐MTAB‐6962, respectively.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details