Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GenotypeGVCFs with mixed ploidy sites #8862

Merged
merged 8 commits into from
Jun 24, 2024
Merged

Fix GenotypeGVCFs with mixed ploidy sites #8862

merged 8 commits into from
Jun 24, 2024

Conversation

meganshand
Copy link
Contributor

Currently if a site has too many alleles GenomicsDB doesn't output PLs for diploid samples at that site. At sites where all samples are diploid this site is successfully skipped. If the site has a mix of haploid and diploid calls (for example chrX with the latest versions of dragen on WGS data), then the site wasn't being skipped correctly and an error would be thrown by GenotypeGVCFs at the annotation step.

Now the site will be skipped if any of the called sites don't have PLs excluding no calls and hom ref genotypes. If the site is all no-calls or hom-ref then it retains the current behavior by only skipping sites where all the genotypes are missing PLs. I'm not sure if this happens frequently in the wild, but it is a common edge case in our tests.

@meganshand
Copy link
Contributor Author

#carrot(HaplotypeCaller CARROT Regression Tests, VariantCallingCarrotOrchestrated.gatk_docker, )

@CarrotBroadBot
Copy link

🥕CARROT🥕 run started

Test: HaplotypeCaller CARROT Regression Tests | Status: building

Run: HaplotypeCaller CARROT Regression Tests_run_2024-06-05 12:54:29.044258506 UTC

Full details
 
 {
  "run_id": "c9ffb306-e25b-4843-bb68-0b6e5ca70b15",
  "test_id": "c3de522b-7ce5-4a51-8b57-1cea628dd93a",
  "run_group_ids": [],
  "name": "HaplotypeCaller CARROT Regression Tests_run_2024-06-05 12:54:29.044258506 UTC",
  "status": "building",
  "test_wdl": "gs://dsp-methods-carrot-data/wdl-prod/8fce9006-acbf-48ed-984a-2ec988d82eea/test.wdl",
  "test_wdl_hash": "272dc41890e3710cc96c32af03df25065cc4aa9dc389e3c2473bddba7be237db3e0698c15ef287c4619cff83e9b2e8e5e0a486eb4534658604e4bb312f308611",
  "test_wdl_dependencies": null,
  "test_wdl_dependencies_hash": null,
  "eval_wdl": "gs://dsp-methods-carrot-data/wdl-prod/7e3704ce-f26c-4465-a6ab-f64faca619f4/eval.wdl",
  "eval_wdl_hash": "8cecc1e6a3ade904ed3bfaae834df6aeac9b50fbc08966557f9e0c1628058b2c64d080f78d0cad222b4b02400db47d540d3a1bcdb8275c475b49a027bf330605",
  "eval_wdl_dependencies": null,
  "eval_wdl_dependencies_hash": null,
  "test_input": {
    "VariantCallingCarrotOrchestrated.CHM_base_file_name": "CHM113",
    "VariantCallingCarrotOrchestrated.CHM_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.CHM_contamination": 0.0,
    "VariantCallingCarrotOrchestrated.CHM_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm1_chm13_hiseqx_sm_hf3mo.bam",
    "VariantCallingCarrotOrchestrated.CHM_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.NIST_base_file_name": "NA24385",
    "VariantCallingCarrotOrchestrated.NIST_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.NIST_contamination": 0.0383312,
    "VariantCallingCarrotOrchestrated.NIST_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bam",
    "VariantCallingCarrotOrchestrated.NIST_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.agg_preemptible_tries": 3,
    "VariantCallingCarrotOrchestrated.break_bands_at_multiples_of": 100000,
    "VariantCallingCarrotOrchestrated.contamination": 0.0,
    "VariantCallingCarrotOrchestrated.exome1_base_file_name": "NA12878Exome1",
    "VariantCallingCarrotOrchestrated.exome1_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/exome_calling_regions.v1.interval_list",
    "VariantCallingCarrotOrchestrated.exome1_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram",
    "VariantCallingCarrotOrchestrated.exome1_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram.crai",
    "VariantCallingCarrotOrchestrated.gatk_control_docker": "broadinstitute/gatk-nightly:latest",
    "VariantCallingCarrotOrchestrated.gatk_docker": "image_build:gatk|1c746d8268c9c07cf7344fcd5b5b8decb2ea9458",
    "VariantCallingCarrotOrchestrated.haplotype_scatter_count": 50,
    "VariantCallingCarrotOrchestrated.monitoring_script": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/cromwell_monitoring_script.sh",
    "VariantCallingCarrotOrchestrated.ref_dict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "VariantCallingCarrotOrchestrated.ref_fasta": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "VariantCallingCarrotOrchestrated.ref_fasta_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "VariantCallingCarrotOrchestrated.use_gatk3_haplotype_caller": true
  },
  "test_options": {
    "read_from_cache": false
  },
  "eval_input": {
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthLabel": "CHM_GRCh38_SYNDIPv20180222",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.twist_exome.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthLabel": "NA12878_GRCh38_TWISTExome",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark_noinconsistent.bed",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthLabel": "HG002_GRCh38_GIAB",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.hapMap": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.haplotype_database.txt",
    "BenchmarkVCFsHeadToHeadOrchestrated.refDict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "BenchmarkVCFsHeadToHeadOrchestrated.refIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "BenchmarkVCFsHeadToHeadOrchestrated.reference": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "BenchmarkVCFsHeadToHeadOrchestrated.referenceVersion": "HG38",
    "BenchmarkVCFsHeadToHeadOrchestrated.stratIntervals": [
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HCR_hg38.bed",
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/LCR_Hg38.interval_list"
    ],
    "BenchmarkVCFsHeadToHeadOrchestrated.stratLabels": [
      "HCR",
      "LCR"
    ]
  },
  "eval_options": {
    "read_from_cache": false
  },
  "test_cromwell_job_id": null,
  "eval_cromwell_job_id": null,
  "created_at": "2024-06-05T12:54:29.059343",
  "created_by": null,
  "finished_at": null,
  "results": null,
  "errors": null
} 
 

@meganshand meganshand requested a review from ldgauthier June 5, 2024 14:01
@CarrotBroadBot
Copy link

🥕CARROT🥕 run finished

Test: HaplotypeCaller CARROT Regression Tests | Status: eval_failed

Run: HaplotypeCaller CARROT Regression Tests_run_2024-06-05 12:54:29.044258506 UTC

Full details
 
 {
  "run_id": "c9ffb306-e25b-4843-bb68-0b6e5ca70b15",
  "test_id": "c3de522b-7ce5-4a51-8b57-1cea628dd93a",
  "run_group_ids": [],
  "name": "HaplotypeCaller CARROT Regression Tests_run_2024-06-05 12:54:29.044258506 UTC",
  "status": "evalfailed",
  "test_wdl": "gs://dsp-methods-carrot-data/wdl-prod/8fce9006-acbf-48ed-984a-2ec988d82eea/test.wdl",
  "test_wdl_hash": "272dc41890e3710cc96c32af03df25065cc4aa9dc389e3c2473bddba7be237db3e0698c15ef287c4619cff83e9b2e8e5e0a486eb4534658604e4bb312f308611",
  "test_wdl_dependencies": null,
  "test_wdl_dependencies_hash": null,
  "eval_wdl": "gs://dsp-methods-carrot-data/wdl-prod/7e3704ce-f26c-4465-a6ab-f64faca619f4/eval.wdl",
  "eval_wdl_hash": "8cecc1e6a3ade904ed3bfaae834df6aeac9b50fbc08966557f9e0c1628058b2c64d080f78d0cad222b4b02400db47d540d3a1bcdb8275c475b49a027bf330605",
  "eval_wdl_dependencies": null,
  "eval_wdl_dependencies_hash": null,
  "test_input": {
    "VariantCallingCarrotOrchestrated.CHM_base_file_name": "CHM113",
    "VariantCallingCarrotOrchestrated.CHM_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.CHM_contamination": 0.0,
    "VariantCallingCarrotOrchestrated.CHM_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm1_chm13_hiseqx_sm_hf3mo.bam",
    "VariantCallingCarrotOrchestrated.CHM_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.NIST_base_file_name": "NA24385",
    "VariantCallingCarrotOrchestrated.NIST_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/wgs_calling_regions.hg38.interval_list",
    "VariantCallingCarrotOrchestrated.NIST_contamination": 0.0383312,
    "VariantCallingCarrotOrchestrated.NIST_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bam",
    "VariantCallingCarrotOrchestrated.NIST_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA24385_NA24385_O1D1_SM-G947H_v1_NS.bai",
    "VariantCallingCarrotOrchestrated.agg_preemptible_tries": 3,
    "VariantCallingCarrotOrchestrated.break_bands_at_multiples_of": 100000,
    "VariantCallingCarrotOrchestrated.contamination": 0.0,
    "VariantCallingCarrotOrchestrated.exome1_base_file_name": "NA12878Exome1",
    "VariantCallingCarrotOrchestrated.exome1_calling_interval_list": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/exome_calling_regions.v1.interval_list",
    "VariantCallingCarrotOrchestrated.exome1_input_bam": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram",
    "VariantCallingCarrotOrchestrated.exome1_input_bam_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/NA12878_forPCRplus_1.cram.crai",
    "VariantCallingCarrotOrchestrated.gatk_control_docker": "broadinstitute/gatk-nightly:latest",
    "VariantCallingCarrotOrchestrated.gatk_docker": "image_build:gatk|1c746d8268c9c07cf7344fcd5b5b8decb2ea9458",
    "VariantCallingCarrotOrchestrated.haplotype_scatter_count": 50,
    "VariantCallingCarrotOrchestrated.monitoring_script": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/cromwell_monitoring_script.sh",
    "VariantCallingCarrotOrchestrated.ref_dict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "VariantCallingCarrotOrchestrated.ref_fasta": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "VariantCallingCarrotOrchestrated.ref_fasta_index": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "VariantCallingCarrotOrchestrated.use_gatk3_haplotype_caller": true
  },
  "test_options": {
    "read_from_cache": false
  },
  "eval_input": {
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.CHM_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.CHM_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcf": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.CHM_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthLabel": "CHM_GRCh38_SYNDIPv20180222",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.CHM_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/chm.full.m38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.twist_exome.interval_list",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.EXOME1_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcf": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.EXOME1_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthLabel": "NA12878_GRCh38_TWISTExome",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.EXOME1_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/GIAB_v3.3.2_NA12878_hg38.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_confidenceInterval": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark_noinconsistent.bed",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlLabel": "CONTROLSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_control_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_control_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_controlVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_control_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalLabel": "TESTSNAPSHOT2018HG002",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalMonitoringExample": "test_output:VariantCallingCarrotOrchestrated.NIST_representative_benchmarking",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalRuntimeSummaries": "test_output:VariantCallingCarrotOrchestrated.NIST_output_runtimes",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcf": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_evalVcfIndex": "test_output:VariantCallingCarrotOrchestrated.NIST_output_vcf_index",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthLabel": "HG002_GRCh38_GIAB",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcf": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz",
    "BenchmarkVCFsHeadToHeadOrchestrated.NIST_truthVcfIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HG002_GRCh38_GIAB_1_22_v4.2.1_benchmark.broad-header.vcf.gz.tbi",
    "BenchmarkVCFsHeadToHeadOrchestrated.hapMap": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.haplotype_database.txt",
    "BenchmarkVCFsHeadToHeadOrchestrated.refDict": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.dict",
    "BenchmarkVCFsHeadToHeadOrchestrated.refIndex": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta.fai",
    "BenchmarkVCFsHeadToHeadOrchestrated.reference": "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/Homo_sapiens_assembly38.fasta",
    "BenchmarkVCFsHeadToHeadOrchestrated.referenceVersion": "HG38",
    "BenchmarkVCFsHeadToHeadOrchestrated.stratIntervals": [
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/HCR_hg38.bed",
      "gs://dsp-methods-carrot-data/test_data/haplotypecaller_tests/LCR_Hg38.interval_list"
    ],
    "BenchmarkVCFsHeadToHeadOrchestrated.stratLabels": [
      "HCR",
      "LCR"
    ]
  },
  "eval_options": {
    "read_from_cache": false
  },
  "test_cromwell_job_id": "93118ea5-8762-47b0-b55a-547cc0ed867b",
  "eval_cromwell_job_id": "2aab3ed0-746b-451e-a7db-1c22fbb1bb29",
  "created_at": "2024-06-05T12:54:29.059343",
  "created_by": null,
  "finished_at": "2024-06-05T21:39:55.026",
  "results": null,
  "errors": null
} 
 

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, thanks for jumping in here! This is probably bus number one code, so it's great to get another set of eyes.

Can you reformat the test? (There are some specifics inline.). I'm too lazy to check out the branch and dump the GenomicsDB, and I'd rather have human-readable test inputs anyway.

Also why are you running the Carrot tests for HaplotypeCaller? I'm not convinced your changes will affect HaplotypeCaller. I think it makes more sense to run WARP on jenkins.

}else {
logger.warn("No genotype contained sufficient data to recalculate site and allele qualities. Site will be skipped at location "
} else {
logger.warn("Some genotypes contained insufficient data to recalculate site and allele qualities. Site will be skipped at location "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

final File output = createTempFile("MixHaploidDiploidHighAltSite", ".vcf");
final ArgumentsBuilder args = new ArgumentsBuilder();
args.addReference(hg38Reference)
.addVCF("gendb://" + toolsTestDir + "/walkers/GenotypeGVCFs/mixHaploidDiploidHighAlt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather the test you added start from VCFs than commit a weird genomicsDB structure to the repo that might break if we update GenomicsDB (which admittedly doesn't happen that often). Also the VCF is human-readable, whereas I have no idea what data you just added here. Take a look at testMaxAltsToCombineInGenomicsDB:
final File tempGenomicsDB2 = GenomicsDBTestUtils.createTempGenomicsDB(inputs, interval);
final String genomicsDBUri2 = GenomicsDBTestUtils.makeGenomicsDBUri(tempGenomicsDB2);

@droazen
Copy link
Contributor

droazen commented Jun 6, 2024

The Carrot run failed due to PAPI error code 9, by the way, not for any reason specific to this branch:

                "executionStatus": "Failed",
                                "message": "Task BenchmarkComparison.EVALRuntimeTask:NA:4 failed. Job exit code 1. Check gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/2aab3ed0-746b-451e-a7db-1c22fbb1bb29/call-CHMSampleHeadToHead/BenchmarkComparison/82289acc-83e7-49c8-acd0-9b2277166e10/call-EVALRuntimeTask/attempt-4/stderr for more information. PAPI error code 9. Please check the log file for more details: gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/2aab3ed0-746b-451e-a7db-1c22fbb1bb29/call-CHMSampleHeadToHead/BenchmarkComparison/82289acc-83e7-49c8-acd0-9b2277166e10/call-EVALRuntimeTask/attempt-4/EVALRuntimeTask.log.",
                        "message": "Workflow failed"
    "status": "Failed",
                            "message": "Task BenchmarkComparison.EVALRuntimeTask:NA:4 failed. Job exit code 1. Check gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/2aab3ed0-746b-451e-a7db-1c22fbb1bb29/call-CHMSampleHeadToHead/BenchmarkComparison/82289acc-83e7-49c8-acd0-9b2277166e10/call-EVALRuntimeTask/attempt-4/stderr for more information. PAPI error code 9. Please check the log file for more details: gs://dsde-methods-carrot-prod-cromwell/BenchmarkVCFsHeadToHeadOrchestrated/2aab3ed0-746b-451e-a7db-1c22fbb1bb29/call-CHMSampleHeadToHead/BenchmarkComparison/82289acc-83e7-49c8-acd0-9b2277166e10/call-EVALRuntimeTask/attempt-4/EVALRuntimeTask.log.",
                    "message": "Workflow failed"
            "message": "Workflow failed"

Looks like the underlying cause is an R parsing issue:

Error in parse(text = text) : <text>:1:1: unexpected '*'
1: *
    ^
Calls: ldply ... llply -> structure -> lapply -> FUN -> eval -> parse

@jamesemery Have you seen that error before in the Carrot HC tests?

@meganshand
Copy link
Contributor Author

@ldgauthier The test is much clearer now, thanks for pointing me to the example. This will end up being tested in WARP with the next GATK release and I'm not sure how easy it is to test two commits of GATK in WARP against each other. If it's possible to do that without updating the official truth data, then I could run that before we merge this. Otherwise we'll end up catching any issues when we update WARP after the next GATK release (which I'm motivated to do when the time comes).

@gatk-bot
Copy link

gatk-bot commented Jun 10, 2024

Github actions tests reported job failures from actions build 9454902078
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 9454902078.11 logs
integration 17.0.6+10 9454902078.0 logs

@meganshand
Copy link
Contributor Author

After speaking with Laura, I will test this commit in WARP against the previous commit to make sure that this change doesn't alter the JointGenotyping outputs. I also added another edge case test, but I want to make sure I run this version on a full size ~60 sample cohort of dragen generated WGS samples over chrX to make sure the downstream code works ok with other haploid/diploid mixes of no-calls/calls.

//Make sure the first site was successfully removed and the second site exists
Assert.assertEquals(outputData.getRight().size(), 1);
final VariantContext vc = outputData.getRight().get(0);
Assert.assertEquals(vc.getStart(), 66780646);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm feeling a little paranoid after so many GenotypeGVCFs hiccups. If you've got the patience to wait for tests again, can you check that the remaining VC has genotypes called as expected with PLs? We do expect PLs right?

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are much easier to read now, thanks. I'd like some genotype checks on the VC that does get emitted, but on the other hand other tests should probably cover that as well.

@meganshand
Copy link
Contributor Author

I updated the test to check the output genotype here and I'm now in the process of running the WARP tests and on a dragen callset with 60 haploid/diploid mix samples.

@meganshand
Copy link
Contributor Author

WARP tests for Exome single sample, WGS single sample, and JointGenotyping have all succeeded so I'm going to merge this one.

@meganshand meganshand merged commit 948cd4f into master Jun 24, 2024
20 of 21 checks passed
@meganshand meganshand deleted the ms_mix_ploidy branch June 24, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants