Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent statistic result #188

Open
hiiiyilingzhang opened this issue Feb 22, 2024 · 0 comments
Open

Inconsistent statistic result #188

hiiiyilingzhang opened this issue Feb 22, 2024 · 0 comments

Comments

@hiiiyilingzhang
Copy link

Describe the bug
When I run test data using hic-pipeline, I found the stats is really different from tests/data/stats.txt provided in repo. Then when I run hic-pipeline and juicer separated on in-house data, stats is also way different. I wonder in which step causing the difference and which result is should I use.

OS/Platform

  • OS/Platform: Ubuntu 20.04.4 LTS
  • Singularity version: v3.11.4
  • Pipeline version: v1.15.1
  • Caper version: v2.2.3

Caper configuration file

backend=local

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=

cromwell=/home/myname/.caper/cromwell_jar/cromwell-82.jar
womtool=/home/myname/.caper/womtool_jar/womtool-82.jar

Input JSON file

caper run /home/myname/Tools/hic-pipeline/hic.wdl --singularity \
-i /home/myname/Tools/hic-pipeline/tests/functional/json/test_hic.json \
-m /home/myname/Tools/hic-pipeline/tests/testPipeline/testrun_metadata.json

Statistic from hic-pipeline

tests/data/stats.txt info in repo

Intra-fragment Reads: 6,969(57.59%)
Hi-C Contacts: 5,132(42.41%)
 Ligation Motif Present: 3 (0.02%)
 3' Bias (Long Range): 65% - 35%
 Pair Type %(L-I-O-R): 25% - 23% - 27% - 25%
Inter-chromosomal: 6 (0.05%)
Intra-chromosomal: 5,126 (42.36%)
Short Range (<20Kb): 4,537 (37.49%)
Long Range (>20Kb): 589 (4.87%)

While when I run hic-pipeline for test data, statistic was like follows

Read type: Paired End
Sequenced Read Pairs:  332888
No chimera found: 11303 (3.40%)
 One or both reads unmapped: 11303 (3.40%)
2 alignments: 321559 (96.60%)
 2 alignments (A...B): 321558 (96.60%)
 2 alignments (A1...A2B; A1B2...B1A2): 1 (0.00%)
3 or more alignments: 26 (0.01%)
Ligation Motif Present: 96 (0.03%)
Average insert size: 496.10
Total Unique: 310081 (96.43%, 93.15%)
Total Duplicates: 11478 (3.57%, 3.45%)
Library Complexity Estimate*: 4,396,440
Intra-fragment Reads: 150,908 (45.33% / 48.67%)
Below MAPQ Threshold: 44,764 (13.45% / 14.44%)
Hi-C Contacts: 114,409 (34.37% / 36.90%)
 3' Bias (Long Range): 80% - 20%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
 L-I-O-R Convergence: 10000000000
Inter-chromosomal: 193 (0.06% / 0.06%)
Intra-chromosomal: 114,216 (34.31% / 36.83%)
Short Range (<20Kb):
  <500BP: 70,381 (21.14% / 22.70%)
  500BP-5kB: 31,397 (9.43% / 10.13%)
  5kB-20kB: 2,338 (0.70% / 0.75%)
Long Range (>20Kb): 10,100 (3.03% / 3.26%)

Hi-C Contacts: 5,132(42.41%) and Hi-C Contacts: 114,409 (34.37% / 36.90%) differ a lot. Then I ran Juicer alone using code below:

bash /home/myname/juicer1.6/scripts/juicer.sh  \
    -d /home/myname/Tools/hic-pipeline/tests/testJuicer \
    -D /home/myname/juicer1.6 \
    -y /home/myname/juicer1.6/restriction_sites/ce10_MboI.txt \
    -g ce10 \
    -z /home/myname/juicer1.6/references/ce10_selected.fa.gz \
    -p /home/myname/juicer1.6/restriction_sites/ce10_selected.chrom.sizes.tsv \
    -s MboI \
    -t 6  &> /home/myname/Tools/hic-pipeline/tests/testJuicer/test_juicer.log &

Output from inter_30.txt looks similar to re-run hic-pipeline

Sequenced Read Pairs:  332,888
 Normal Paired: 321,558 (96.60%)
 Chimeric Paired: 17 (0.01%)
 Chimeric Ambiguous: 3 (0.00%)
 Unmapped: 11,310 (3.40%)
 Ligation Motif Present: 96 (0.03%)
Alignable (Normal+Chimeric Paired): 321,575 (96.60%)
Unique Reads: 310,096 (93.15%)
PCR Duplicates: 11,479 (3.45%)
Optical Duplicates: 0 (0.00%)
Library Complexity Estimate: 4,396,491
Intra-fragment Reads: 176,096 (52.90% / 56.79%)
Below MAPQ Threshold: 19,594 (5.89% / 6.32%)
Hi-C Contacts: 114,406 (34.37% / 36.89%)
 Ligation Motif Present: 83  (0.02% / 0.03%)
 3' Bias (Long Range): 64% - 36%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 193  (0.06% / 0.06%)
Intra-chromosomal: 114,213  (34.31% / 36.83%)
Short Range (<20Kb): 104,108  (31.27% / 33.57%)
Long Range (>20Kb): 10,098  (3.03% / 3.26%)

Inconsistency in in-house data

I also run the test using mm10 5G in-house data. I found the result from hic-pipeline is also differ from use of juicer only. Statistic shows below:
hic-pipeline(stats_30.txt):

Read type: Paired End
Sequenced Read Pairs:  44334059
No chimera found: 221708 (0.50%)
 One or both reads unmapped: 221708 (0.50%)
2 alignments: 36958351 (83.36%)
 2 alignments (A...B): 10224763 (23.06%)
 2 alignments (A1...A2B; A1B2...B1A2): 26733588 (60.30%)
3 or more alignments: 7154000 (16.14%)
Ligation Motif Present: 37372441 (84.30%)
Average insert size: 215.17
Total Unique: 30990053 (83.85%, 69.90%)
Total Duplicates: 5968298 (16.15%, 13.46%)
Library Complexity Estimate*: 101,748,185
Intra-fragment Reads: 16,460,941 (37.13% / 53.12%)
Below MAPQ Threshold: 5,811,532 (13.11% / 18.75%)
Hi-C Contacts: 8,717,580 (19.66% / 28.13%)
 3' Bias (Long Range): N/A
 Pair Type %(L-I-O-R): N/A
Inter-chromosomal: 8,717,580 (19.66% / 28.13%)
Intra-chromosomal: 0 (0.00% / 0.00%)
Short Range (<20Kb):
  <500BP: 0 (0.00% / 0.00%)
  500BP-5kB: 0 (0.00% / 0.00%)
  5kB-20kB: 0 (0.00% / 0.00%)
Long Range (>20Kb): 0 (0.00% / 0.00%)

juicer (inter_30.txt):

Sequenced Read Pairs:  44,334,059
 Normal Paired: 10,235,807 (23.09%)
 Chimeric Paired: 27,159,485 (61.26%)
 Chimeric Ambiguous: 6,681,622 (15.07%)
 Unmapped: 257,145 (0.58%)
 Ligation Motif Present: 37,372,441 (84.30%)
Alignable (Normal+Chimeric Paired): 37,395,292 (84.35%)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2024-02-21T14:06:36,684]  [Globals.java:138] [main]  Development mode is enabled
Unique Reads: 31,288,886 (70.58%)
PCR Duplicates: 2,344,475 (5.29%)
Optical Duplicates: 3,761,931 (8.49%)
Library Complexity Estimate: 229,902,226
Intra-fragment Reads: 489,331 (1.10% / 1.56%)
Below MAPQ Threshold: 5,890,557 (13.29% / 18.83%)
Hi-C Contacts: 24,908,998 (56.18% / 79.61%)
 Ligation Motif Present: 21,773,154  (49.11% / 69.59%)
 3' Bias (Long Range): 77% - 23%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 8,785,879  (19.82% / 28.08%)
Intra-chromosomal: 16,123,119  (36.37% / 51.53%)
Short Range (<20Kb): 6,129,919  (13.83% / 19.59%)
Long Range (>20Kb): 9,992,801  (22.54% / 31.94%)

hic-pipeline identified Hi-C Contacts: 8,717,580 (19.66% / 28.13%) while juicer have Hi-C Contacts: 24,908,998 (56.18% / 79.61%). And no Intra-chromosomal contact detected from hic-pipeline also weird to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant