Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More refactoring PDHCE and preparing for joint detection #8467

Closed
wants to merge 28 commits into from

Conversation

davidbenjamin
Copy link
Contributor

No description provided.

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 5826613584
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 17.0.6+10 5826613584.2 logs

@davidbenjamin
Copy link
Contributor Author

#carrot(HaplotypeCaller CARROT Regression Tests, VariantCallingCarrotOrchestrated.gatk_docker, )

@CarrotBroadBot
Copy link

🥕CARROT🥕 run started

Test: HaplotypeCaller CARROT Regression Tests | Status: building

Run: HaplotypeCaller CARROT Regression Tests_run_2023-08-21 22:38:12.285896770 UTC

Full details
 
 {
  "run_id": "83db9528-9aa0-4963-a382-95c0dc24102d",
  "test_id": "c3de522b-7ce5-4a51-8b57-1cea628dd93a",
  "run_group_ids": [],
  "name": "HaplotypeCaller CARROT Regression Tests_run_2023-08-21 22:38:12.285896770 UTC",
  "status": "building",
  "test_wdl": "gs://dsp-methods-carrot-data/wdl-prod/8fce9006-acbf-48ed-984a-2ec988d82eea/test.wdl",
  "test_wdl_hash": "272dc41890e3710cc96c32af03df25065cc4aa9dc389e3c2473bddba7be237db3e0698c15ef287c4619cff83e9b2e8e5e0a486eb4534658604e4bb312f308611",
  "test_wdl_dependencies": null,
  "test_wdl_dependencies_hash": null,
  "eval_wdl": "gs://dsp-methods-carrot-data/wdl-prod/7e3704ce-f26c-4465-a6ab-f64faca619f4/eval.wdl",
  "eval_wdl_hash": "8cecc1e6a3ade904ed3bfaae834df6aeac9b50fbc08966557f9e0c1628058b2c64d080f78d0cad222b4b02400db47d540d3a1bcdb8275c475b49a027bf330605",
  "eval_wdl_dependencies": null,
  "eval_wdl_dependencies_hash": null,
  "test_input": {
    "VariantCallingCarrotOrchestrated.CHM_base_file_name": "CHM113",
    "VariantCallingCarrotOrchestrated.CHM_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.CHM_contamination": 0.0,
    "VariantCallingCarrotOrchestrated.CHM_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm1_chm13_hiseqx_sm_hf3mo.bam",
    "VariantCallingCarrotOrchestrated.CHM_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.NIST_base_file_name": "NA24385",
    "VariantCallingCarrotOrchestrated.NIST_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.NIST_contamination": 0.0383312,
    "VariantCallingCarrotOrchestrated.NIST_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bam",
    "VariantCallingCarrotOrchestrated.NIST_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.agg_preemptible_tries": 3,
    "VariantCallingCarrotOrchestrated.break_bands_at_multiples_of": 100000,
    "VariantCallingCarrotOrchestrated.contamination": 0.0,
    "VariantCallingCarrotOrchestrated.exome1_base_file_name": "NA12878Exome1",
    "VariantCallingCarrotOrchestrated.exome1_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/exome_calling_regions.v1.interval_list",
    "VariantCallingCarrotOrchestrated.exome1_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram",
    "VariantCallingCarrotOrchestrated.exome1_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram.crai",
    "VariantCallingCarrotOrchestrated.gatk_control_docker": "broadinstitute/gatk-nightly:latest",
    "VariantCallingCarrotOrchestrated.gatk_docker": "image_build:gatk|51e776f8938f2e9f763a24489bc6f5d33d7bd020",
    "VariantCallingCarrotOrchestrated.haplotype_scatter_count": 50,
    "VariantCallingCarrotOrchestrated.monitoring_script": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/cromwell_monitoring_script.sh",
    "VariantCallingCarrotOrchestrated.ref_dict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "VariantCallingCarrotOrchestrated.ref_fasta": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "VariantCallingCarrotOrchestrated.ref_fasta_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "VariantCallingCarrotOrchestrated.use_gatk3_haplotype_caller": true
  },
  "test_options": {
    "read_from_cache": false
  },
  "eval_input": {
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthLabel": "CHM_GRCh38_SYNDIPv20180222",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.twist_exome.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthLabel": "NA12878_GRCh38_TWISTExome",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark_noinconsistent.bed",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthLabel": "HG002_GRCh38_GIAB",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.hapMap": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.haplotype_database.txt",
    "BenchmarkVCFsHeadToHeadOrchestrated.refDict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "BenchmarkVCFsHeadToHeadOrchestrated.refIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "BenchmarkVCFsHeadToHeadOrchestrated.reference": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "BenchmarkVCFsHeadToHeadOrchestrated.referenceVersion": "HG38",
    "BenchmarkVCFsHeadToHeadOrchestrated.stratIntervals": [
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HCR_hg38.bed",
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/LCR_Hg38.interval_list"
    ],
    "BenchmarkVCFsHeadToHeadOrchestrated.stratLabels": [
      "HCR",
      "LCR"
    ]
  },
  "eval_options": {
    "read_from_cache": false
  },
  "test_cromwell_job_id": null,
  "eval_cromwell_job_id": null,
  "created_at": "2023-08-21T22:38:12.285936",
  "created_by": null,
  "finished_at": null,
  "results": null,
  "errors": null
} 
 

@CarrotBroadBot
Copy link

🥕CARROT🥕 run finished

Test: HaplotypeCaller CARROT Regression Tests | Status: succeeded

Run: HaplotypeCaller CARROT Regression Tests_run_2023-08-21 22:38:12.285896770 UTC

Results
Results
CHM controlHCprocesshours 90.613975
CHM controlHCsystemhours 0.19898611111111109
CHM controlHCwallclockhours 63.943677777777786
CHM controlHCwallclockmax 3.1089944444444444
CHM controlMonitoringLogs View in the GCS Console
CHM controlindelF1Score 0.8724
CHM controlindelPrecision 0.8814
CHM controlsnpF1Score 0.9784
CHM controlsnpPrecision 0.9706
CHM controlsnpRecall 0.9863
CHM controlsummary View in the GCS Console
CHM evalHCprocesshours 93.63756388888888
CHM evalHCsystemhours 0.6379805555555556
CHM evalHCwallclockhours 70.50882222222222
CHM evalHCwallclockmax 3.5186027777777777
CHM evalMonitoringLogs View in the GCS Console
CHM evalindelF1Score 0.8724
CHM evalindelPrecision 0.8814
CHM evalsnpF1Score 0.9784
CHM evalsnpPrecision 0.9706
CHM evalsnpRecall 0.9863
CHM evalsummary View in the GCS Console
EXOME1 controlindelF1Score 0.727
EXOME1 controlindelPrecision 0.632
EXOME1 controlsnpF1Score 0.9878
EXOME1 controlsnpPrecision 0.9815
EXOME1 controlsnpRecall 0.9941
EXOME1 controlsummary View in the GCS Console
EXOME1 evalindelF1Score 0.727
EXOME1 evalindelPrecision 0.632
EXOME1 evalsnpF1Score 0.9878
EXOME1 evalsnpPrecision 0.9815
EXOME1 evalsnpRecall 0.9941
EXOME1 evalsummary View in the GCS Console
NIST controlHCprocesshours 108.95665833333332
NIST controlHCsystemhours 0.21568055555555551
NIST controlHCwallclockhours 78.62844166666666
NIST controlHCwallclockmax 4.166558333333334
NIST controlMonitoringLogs View in the GCS Console
NIST controlindelF1Score 0.9902
NIST controlindelPrecision 0.9903
NIST controlsnpF1Score 0.9899
NIST controlsnpPrecision 0.9887
NIST controlsnpRecall 0.9911
NIST controlsummary View in the GCS Console
NIST evalHCprocesshours 112.84528333333336
NIST evalHCsystemhours 0.8645277777777777
NIST evalHCwallclockhours 88.01737777777778
NIST evalHCwallclockmax 4.8386555555555555
NIST evalMonitoringLogs View in the GCS Console
NIST evalindelF1Score 0.9902
NIST evalindelPrecision 0.9903
NIST evalsnpF1Score 0.9899
NIST evalsnpPrecision 0.9887
NIST evalsnpRecall 0.9911
NIST evalsummary View in the GCS Console
ROC_Plots_Reported View in the GCS Console
Full details
 
 {
  "run_id": "83db9528-9aa0-4963-a382-95c0dc24102d",
  "test_id": "c3de522b-7ce5-4a51-8b57-1cea628dd93a",
  "run_group_ids": [],
  "name": "HaplotypeCaller CARROT Regression Tests_run_2023-08-21 22:38:12.285896770 UTC",
  "status": "succeeded",
  "test_wdl": "gs://dsp-methods-carrot-data/wdl-prod/8fce9006-acbf-48ed-984a-2ec988d82eea/test.wdl",
  "test_wdl_hash": "272dc41890e3710cc96c32af03df25065cc4aa9dc389e3c2473bddba7be237db3e0698c15ef287c4619cff83e9b2e8e5e0a486eb4534658604e4bb312f308611",
  "test_wdl_dependencies": null,
  "test_wdl_dependencies_hash": null,
  "eval_wdl": "gs://dsp-methods-carrot-data/wdl-prod/7e3704ce-f26c-4465-a6ab-f64faca619f4/eval.wdl",
  "eval_wdl_hash": "8cecc1e6a3ade904ed3bfaae834df6aeac9b50fbc08966557f9e0c1628058b2c64d080f78d0cad222b4b02400db47d540d3a1bcdb8275c475b49a027bf330605",
  "eval_wdl_dependencies": null,
  "eval_wdl_dependencies_hash": null,
  "test_input": {
    "VariantCallingCarrotOrchestrated.CHM_base_file_name": "CHM113",
    "VariantCallingCarrotOrchestrated.CHM_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.CHM_contamination": 0.0,
    "VariantCallingCarrotOrchestrated.CHM_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm1_chm13_hiseqx_sm_hf3mo.bam",
    "VariantCallingCarrotOrchestrated.CHM_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.NIST_base_file_name": "NA24385",
    "VariantCallingCarrotOrchestrated.NIST_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.NIST_contamination": 0.0383312,
    "VariantCallingCarrotOrchestrated.NIST_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bam",
    "VariantCallingCarrotOrchestrated.NIST_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.agg_preemptible_tries": 3,
    "VariantCallingCarrotOrchestrated.break_bands_at_multiples_of": 100000,
    "VariantCallingCarrotOrchestrated.contamination": 0.0,
    "VariantCallingCarrotOrchestrated.exome1_base_file_name": "NA12878Exome1",
    "VariantCallingCarrotOrchestrated.exome1_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/exome_calling_regions.v1.interval_list",
    "VariantCallingCarrotOrchestrated.exome1_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram",
    "VariantCallingCarrotOrchestrated.exome1_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram.crai",
    "VariantCallingCarrotOrchestrated.gatk_control_docker": "broadinstitute/gatk-nightly:latest",
    "VariantCallingCarrotOrchestrated.gatk_docker": "image_build:gatk|51e776f8938f2e9f763a24489bc6f5d33d7bd020",
    "VariantCallingCarrotOrchestrated.haplotype_scatter_count": 50,
    "VariantCallingCarrotOrchestrated.monitoring_script": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/cromwell_monitoring_script.sh",
    "VariantCallingCarrotOrchestrated.ref_dict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "VariantCallingCarrotOrchestrated.ref_fasta": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "VariantCallingCarrotOrchestrated.ref_fasta_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "VariantCallingCarrotOrchestrated.use_gatk3_haplotype_caller": true
  },
  "test_options": {
    "read_from_cache": false
  },
  "eval_input": {
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthLabel": "CHM_GRCh38_SYNDIPv20180222",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.twist_exome.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthLabel": "NA12878_GRCh38_TWISTExome",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark_noinconsistent.bed",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthLabel": "HG002_GRCh38_GIAB",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.hapMap": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.haplotype_database.txt",
    "BenchmarkVCFsHeadToHeadOrchestrated.refDict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "BenchmarkVCFsHeadToHeadOrchestrated.refIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "BenchmarkVCFsHeadToHeadOrchestrated.reference": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "BenchmarkVCFsHeadToHeadOrchestrated.referenceVersion": "HG38",
    "BenchmarkVCFsHeadToHeadOrchestrated.stratIntervals": [
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HCR_hg38.bed",
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/LCR_Hg38.interval_list"
    ],
    "BenchmarkVCFsHeadToHeadOrchestrated.stratLabels": [
      "HCR",
      "LCR"
    ]
  },
  "eval_options": {
    "read_from_cache": false
  },
  "test_cromwell_job_id": "7b1f3c2d-059a-4391-92d7-b2f88045d8d5",
  "eval_cromwell_job_id": "ba9f32d5-7b46-462c-8d1f-5692eee05534",
  "created_at": "2023-08-21T22:38:12.285936",
  "created_by": null,
  "finished_at": "2023-08-22T09:23:01.973",
  "results": {
    "CHM controlHCprocesshours": "90.613975",
    "CHM controlHCsystemhours": "0.19898611111111109",
    "CHM controlHCwallclockhours": "63.943677777777786",
    "CHM controlHCwallclockmax": "3.1089944444444444",
    "CHM controlMonitoringLogs": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-CHMSampleHeadToHead/BenchmarkComparison/b7ddd5f2-fded-4076-b163-33ad637fb5bd/call-CONTROLRuntimeTask/monitoring.pdf",
    "CHM controlindelF1Score": "0.8724",
    "CHM controlindelPrecision": "0.8814",
    "CHM controlsnpF1Score": "0.9784",
    "CHM controlsnpPrecision": "0.9706",
    "CHM controlsnpRecall": "0.9863",
    "CHM controlsummary": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-CHMSampleHeadToHead/BenchmarkComparison/b7ddd5f2-fded-4076-b163-33ad637fb5bd/call-BenchmarkVCFControlSample/Benchmark/10080eab-b0ad-4752-80cb-fc6d34bd9ad9/call-CombineSummaries/summary.csv",
    "CHM evalHCprocesshours": "93.63756388888888",
    "CHM evalHCsystemhours": "0.6379805555555556",
    "CHM evalHCwallclockhours": "70.50882222222222",
    "CHM evalHCwallclockmax": "3.5186027777777777",
    "CHM evalMonitoringLogs": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-CHMSampleHeadToHead/BenchmarkComparison/b7ddd5f2-fded-4076-b163-33ad637fb5bd/call-EVALRuntimeTask/monitoring.pdf",
    "CHM evalindelF1Score": "0.8724",
    "CHM evalindelPrecision": "0.8814",
    "CHM evalsnpF1Score": "0.9784",
    "CHM evalsnpPrecision": "0.9706",
    "CHM evalsnpRecall": "0.9863",
    "CHM evalsummary": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-CHMSampleHeadToHead/BenchmarkComparison/b7ddd5f2-fded-4076-b163-33ad637fb5bd/call-BenchmarkVCFTestSample/Benchmark/c718736b-bf86-491f-9f9c-56c07cbd0c90/call-CombineSummaries/summary.csv",
    "EXOME1 controlindelF1Score": "0.727",
    "EXOME1 controlindelPrecision": "0.632",
    "EXOME1 controlsnpF1Score": "0.9878",
    "EXOME1 controlsnpPrecision": "0.9815",
    "EXOME1 controlsnpRecall": "0.9941",
    "EXOME1 controlsummary": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-EXOME1SampleHeadToHead/BenchmarkComparison/85b07a68-f04f-4396-80b4-f153b2d0020d/call-BenchmarkVCFControlSample/Benchmark/efb3b5ff-3860-46c3-8c6c-9141d1ff0e0a/call-CombineSummaries/summary.csv",
    "EXOME1 evalindelF1Score": "0.727",
    "EXOME1 evalindelPrecision": "0.632",
    "EXOME1 evalsnpF1Score": "0.9878",
    "EXOME1 evalsnpPrecision": "0.9815",
    "EXOME1 evalsnpRecall": "0.9941",
    "EXOME1 evalsummary": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-EXOME1SampleHeadToHead/BenchmarkComparison/85b07a68-f04f-4396-80b4-f153b2d0020d/call-BenchmarkVCFTestSample/Benchmark/272d076b-7300-4ea4-bbf7-d63f80fad94b/call-CombineSummaries/summary.csv",
    "NIST controlHCprocesshours": "108.95665833333332",
    "NIST controlHCsystemhours": "0.21568055555555551",
    "NIST controlHCwallclockhours": "78.62844166666666",
    "NIST controlHCwallclockmax": "4.166558333333334",
    "NIST controlMonitoringLogs": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-NISTSampleHeadToHead/BenchmarkComparison/043115ef-b68a-49a3-8272-8352b304c3aa/call-CONTROLRuntimeTask/monitoring.pdf",
    "NIST controlindelF1Score": "0.9902",
    "NIST controlindelPrecision": "0.9903",
    "NIST controlsnpF1Score": "0.9899",
    "NIST controlsnpPrecision": "0.9887",
    "NIST controlsnpRecall": "0.9911",
    "NIST controlsummary": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-NISTSampleHeadToHead/BenchmarkComparison/043115ef-b68a-49a3-8272-8352b304c3aa/call-BenchmarkVCFControlSample/Benchmark/b7031327-e5c1-4869-a5d9-98e5a8934db9/call-CombineSummaries/summary.csv",
    "NIST evalHCprocesshours": "112.84528333333336",
    "NIST evalHCsystemhours": "0.8645277777777777",
    "NIST evalHCwallclockhours": "88.01737777777778",
    "NIST evalHCwallclockmax": "4.8386555555555555",
    "NIST evalMonitoringLogs": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-NISTSampleHeadToHead/BenchmarkComparison/043115ef-b68a-49a3-8272-8352b304c3aa/call-EVALRuntimeTask/monitoring.pdf",
    "NIST evalindelF1Score": "0.9902",
    "NIST evalindelPrecision": "0.9903",
    "NIST evalsnpF1Score": "0.9899",
    "NIST evalsnpPrecision": "0.9887",
    "NIST evalsnpRecall": "0.9911",
    "NIST evalsummary": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-NISTSampleHeadToHead/BenchmarkComparison/043115ef-b68a-49a3-8272-8352b304c3aa/call-BenchmarkVCFTestSample/Benchmark/d4de27fe-6aca-42a5-8a9f-6daff7b890e8/call-CombineSummaries/summary.csv",
    "ROC_Plots_Reported": "gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/ba9f32d5-7b46-462c-8d1f-5692eee05534/call-CreateHTMLReport/report.html"
  },
  "errors": null
} 
 

@CarrotBroadBot
Copy link

🥕CARROT🥕 report map stub finished

for test HaplotypeCaller CARROT Regression Tests (run: 83db9528-9aa0-4963-a382-95c0dc24102d)

File URI
empty_notebook View in the GCS Console
html_report View in the GCS Console
populated_notebook View in the GCS Console
run_csv_zip View in the GCS Console

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 5941400999
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 17.0.6+10 5941400999.2 logs

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to dig into this closely... A lot of documentation/name/clarifying comments and a few meaty comments about potential behavior oversights/changes.... I do think that if we do not have coverage for the "determinePDHaps" codepath we must make coverage and test it because thats the most potentially changed and messy part of the code that does not mesh well at all with the flipped implementation of event groups...

There are several nested loops here:

Layer 1: iterate over all determined event positions
Layer 2: iterate over all alleles at that position, including the reference allele unless we are making determined haplotypes, to set as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusing wording

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better now.

sourceSet.setPartiallyDeterminedMode();
}
Utils.printIf(debug, () -> "Returning "+outputHaplotypes.size()+" to the HMM");
sourceSet.setPartiallyDeterminedMode(!pileupArgs.determinePDHaps);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this is necessary to flag later PDHMM genotyping (and that we want to set it at the end here because the above code might fail out and thus this is a bell weather)

* Partition events into clusters that must be considered together, either because they overlap or because they belong to the
* same mutually exclusive pair or trio. To find this clustering we calculate the connected components of an undirected graph
* with an edge connecting events that overlap or are mutually excluded.
* Partition events into the largest possible clusters such that events in distinct clusters are mutually compatible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this description covers that in the average case you are ending up with a bunch of singleton event groups... you could say the maximum nubmer of clusters such that.... but also I would maybe spell out the consequences of this? Maybe even with an example with an external SNP - A and a Del + SNP that overlap showing that this would give you two event groups?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a note for posterity here or elsewhere that this is a significant divergence from the dagen code.

Copy link
Contributor Author

@davidbenjamin davidbenjamin Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about the common case where all events are compatible and singleton EventGroups result.

As far as I understand the EventGroups that this yields are identical to DRAGEN's, but I added a note that the algorithm is significantly different.

// Special case (if we are determining bases outside of this mutex cluster we can reuse the work from previous iterations)
if (locusOverlapSet.isEmpty() && cachedEventSets != null) {
// We use a cache for the recurring case where the determined events do not belong to this event group
if (determinedSubset.isEmpty() && cachedEventSets != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this logic is correct anymore? Determined subset will usually never be empty here because its coming from the list of alleles and you haven't actually checked that it is or isn't present in this event group.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think its safer to put this subset checking code inside of this method rather than outside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved this part.

@jamesemery
Copy link
Collaborator

@davidbenjamin back to you. I notice a test is broken?

@jamesemery
Copy link
Collaborator

@davidbenjamin I notice this PR was uploaded in two places?

@davidbenjamin
Copy link
Contributor Author

@jamesemery back to you -- in the other PR, which was an accident. While you take another look I think I will work on a unit test for the new branching code.

@davidbenjamin
Copy link
Contributor Author

@jamesemery To be clear, comments are here, but code to review is in the other PR!!!!

@davidbenjamin davidbenjamin deleted the db_pdhce_august_2023 branch September 21, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants