Skip to content
brentp edited this page Aug 6, 2012 · 3 revisions

Q1. Why is the A.C.F. calculated more than once?

In short, we first calculate the ACF out to a user-specified distance. Upon finding regions, we then calculate the ACF out the the length of the longest region. In detail:

First, we calculate the A.C.F. only out to this distance specified. Usually, this distance is out to the extent to which we see any level of autocorrelation. The ACF calculated at this step is the basis for the SLK correction for each probe.

Second, in a later step, we want to determine the significance of the regions. In order to do
this, we treat each region as a group of p-values. In order to determine the combined-probability of the region, we first need to determine the autocorrelation out as far as the length of the longest region. We then use that autocorrelation to calculate a single p-value for each region.

In between those 2 steps, we use FDR correction and then find regions using the q-values. Those regions are sent to the second step above.

Q2. Why use FDR correction and the one-step Sidak?

The F.D.R. correction is on the raw-pvalues. We can do this correction either using the traditional Benjamini Hochberg (which assumes the null is a uniform distribution), or by specifying a null distribution (likely generated by shuffling the clinical data relative to the probe data and then generating p-values). We can not do this type of correction for the regions because they vary in size and p-value.

So, as an example we say that if there are 120 total bases (in most cases our numbers will be 1000 fold higher) and a region of 12 bases. Then we say that there are 10 possible regions of size 12 in our data and we use that for the Sidak correction. In practice, when the size of each region is much smaller than the total bases of coverage, the Sidak becomes identical to the Bonferroni correction.