Skip to content

wmalab/scHiCDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scHiCDiff

scHiCDiff is a novel statistical algorithm to detect differential chromatin interactions (DCIs) between two Hi-C experiments at single-cell level. Here, we introduced 4 ways to capture the DCIs: two non-parametric tests (Kolmogorov–Smirnov test/ Cramér-von Mises test) and parametric likelihood ratio test with two regression models (Negative Binomial/ Zero-inflated Negative Binomial). Non-parametric tests are advantageous by allowing us detecting DCIs without any assumption on data distribution; negative binomial(NB) is the most common assumption for interaction counts in bulk Hi-C parametric approaches, while zero-inflated Negative Binomial(ZINB) regression models is specially designated for the interaction comparison at single-cell level by taking the excessive zeros feature into consideration.

Installation

To install and load the developmental version of scHiCDiff in R:


install.packages("path/scHiCDiff_1.0.tar.gz", repos = NULL, type ="source")
library(scHiCDiff)

License

MIT

Usage

The functions in scHiCDiff can be classified as two types: The first type is the normalization function (scHiCDiff.sim) and the other type is the detection function (scHiCDiff.KS, scHiCDiff.CVM, scHiCDiff.NB and scHiCDiff.ZINB).

Normalization Function

The inputs of the normalization function scHiCDiff.norm are illustrated below:

bias.info.path      The pathway of the three local features (effective length,GC content 
                    and mappability of fragment ends) of all bins. The generation of these 
                    items is available at http://dna.cs.miami.edu/scHiCNorm.
dat_HiC             A N*N scHi-C matrix.

The function returns the normalized Hi-C matrix.

Detection Functions

The inputs for all detection functions are illustrated below:

count.table      A non-negative  matrix of scHi-C normalized read counts.The rows of the 
                 matrix are bin pair and columns are samples/cells.
group            A vector of factor which mentions the two condition to be compared, 
                 corresponding to the columns in the count table.

The detection function will return a data frame containing the differential chromatin interaction (DCI) analysis results, rows are bin pairs and columns lists the related statistics.

The outputs for the two parametric models are listed below:

bin_1,bin_2          The interacting region of the bin pair.
mu_1,mu_2,           MLE of the parameters of NB/ZINB/NBH of group 1 and group 2,
theta_1,theta_2      where mu and theta represent the mean and dispersion estimate of
(pi_1,pi_2)          negative binomial, pi denotes the estimate of zero percentange
norm_total_mean_1,   Mean of normalized read counts of group 1 and group 2.
norm_total_mean_2
norm_foldChange      norm_total_mean_1/norm_total_mean_2.
chi2LR1              Chi-square statistic for hypothesis testing of H0.
pvalue               P value of hypothesis testing of H0 (underlying whether a bin pair 
                     is a DCI).
pvalue.adj.FDR       Adjusted P value of H0's pvalue using Benjamini & Hochberg's method.
Remark               Record of abnormal program information.

The outputs for the non-parametric tests are shown below:

bin_1,bin_2          The interacting region of the bin pair.
test.statistic       The statistic given by KS/CVM test.
pvalue               P value of hypothesis testing of H0 (underlying whether a bin pair 
                     is a DCI).
pvalue.adj           Adjusted P value of H0's pvalue using Benjamini & Hochberg's method.

Example: The simulated data getting from chr1 of 1CDX1 with cell num=50, fold change=5 and resolution=200kb (Nagano et.al.) were untilized as sample data. In the sample data file, it lists all bin pairs with at least one non-zero counts in one of cell types. The first two columns represent the interacting region of each listed bin pair, then followed 50 columns denote the normalized read counts for condition 1 and the last 50 columns denote the normalized read counts for condition 2.

count.table <- readRDS(paste("path/sampledata/sim 1 .rds")
count.table <- as.matrix(count.table)
group <- factor(c(rep(1,50), rep(2,50)))
result.ks <- scHiCDiff.KS(count.table,group)
result.cvm <- scHiCDiff.CVM(count.table,group)
result.nb <- scHiCDiff.NB(count.table,group)
result.zinb <- scHiCDiff.ZINB(count.table,group)
#common DCIs identified by four methods
common.DCIs <- scHiCDiff.common.DCI(result.ks,result.cvm,result.nb,result.zinb)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages