Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find the source of the 5% HaplotypeCaller performance regression in the past year #8327

Open
jamesemery opened this issue May 17, 2023 · 0 comments

Comments

@jamesemery
Copy link
Collaborator

jamesemery commented May 17, 2023

In the discussion in this branch #6351 (review) we were tripped up by the fact that the Carrot tests were showing a slight (between 5 and 7% on the aggregated $ time command output across 50 shards) runtime regression in the current version of HaplotypeCaller compared with a misconfigured older version of the tool. Specifically the faster older version was broadinstitute/gatk-nightly:2022-03-04-4.2.5.0-9-gb097f75c5-NIGHTLY-SNAPSHOT which was before the Java 17 migration (which is a high likelihood culprit form the past year).

Somebody should spend a few hours with a profiler to make sure there isn't some obvious culprit.

Here is the command that Carrot was running:

 -R /cromwell_root/dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta \
 -I gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm1_chm13_hiseqx_sm_hf3mo.bam \
 -L /cromwell_root/dsde-methods-carrot-prod-cromwell/VariantCallingCarrotOrchestrated/9886a710-334a-41eb-a495-6968d322730a/call-CHMSampleHeadToHead/VariantCallingCarrot/63594353-145d-4c4a-a713-352ad41ff3e6/call-ScatterIntervalList/cacheCopy/glob-cb4648beeaff920acb03de7603c06f98/10scattered.interval_list \
 -O CHM113.g.vcf.gz \
 -contamination 0.0 \
 -G StandardAnnotation -G StandardHCAnnotation -G AS_StandardAnnotation \
  \
  \
  \
 -GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90 \
 -ERC GVCF \

And a shard where a significant slowdown was observed spanned the region chr3:55313816 -> chr3:113699078 which should hopefully provide a good starting point for anybody investigating this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant