You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
AT_DROPOUT calculation is described as "For each GC bin [0..50] we calculate a = % of target territory, and b = % of aligned reads that align to these targets. AT DROPOUT is then abs(sum(a-b when a-b < 0))."
I have a few questions to help clarify:
Is each "bin" a target region?
Is "a" equivalent to the expected coverage of reads?
Is "b" equivalent to the observed coverage of reads?
If we only sum "a-b" when "a-b < 0", does this mean we only sum regions where the observed coverage is greater than expected? Would this measure an enrichment in coverage, not a depletion?
If (4) is true, is the explanation "if the value is 5% this implies that 5% of total reads that should have mapped to GC<=50% regions mapped elsewhere" correct? it seems like the calculation is measuring enrichment. So maybe the equation should be "when a-b > 0" instead?
Thank you so much!
The text was updated successfully, but these errors were encountered:
Hi @jcharlton67 , I'll try to provide some clarifications to the best of my knowledge:
Not exactly, but somewhat related. Each bin tracks the number of windows and the number of read starts (within those windows) for some GC percentage, which is why there are 101 bins (from 0 to 100 inclusive). For example GC bin[0] would contain the number of windows in the target regions with 0% GC content and the number of reads aligned which have start positions contained within these windows. The default window size should be 100 if I'm not mistaken.
I believe it would be fair to say that yes.
Same as above, I believe that's one way to look at it.
4 and 5. I believe you are correct in your observation. I've checked the code and the calculations seem to be correct but the documentation seems to be incorrect and it should be "when a - b > 0" indeed.
I'll open a PR regarding the documentation issue and if it is indeed a typo, we'll change it.
Documentation request
Tool involved : CollectHsMetrics AT_DROPOUT
Description
AT_DROPOUT calculation is described as "For each GC bin [0..50] we calculate a = % of target territory, and b = % of aligned reads that align to these targets. AT DROPOUT is then abs(sum(a-b when a-b < 0))."
I have a few questions to help clarify:
Thank you so much!
The text was updated successfully, but these errors were encountered: