Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete gVCF files with --target_bed #344

Closed
3 of 4 tasks
jtangrot opened this issue Feb 12, 2021 · 1 comment
Closed
3 of 4 tasks

Incomplete gVCF files with --target_bed #344

jtangrot opened this issue Feb 12, 2021 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jtangrot
Copy link

Check Documentation

I have checked the following places for your error:

Description of the bug

I used Sarek to generate gVCF files with the tool HaplotypeCaller, and then planned to do joint genotyping myself on all samples together. As this is exome sequencing, I first used the option --target_bed, but realised that this results in lots of missing genotypes. The reason is that "bcftools isec" apparently is run on the gvcf files, which removes all regions where the start of a non-variant block in the gvcf is not within the regions listed in the bed file. This means that many of the regions with reference alleles are removed from the file, even if parts of these blocks are indeed covered by the bed (bcftools does not look at the END tag). The vcf files generated for each sample are fine though.

Steps to reproduce

Command line:
nextflow run ~/sarek/main.nf -profile uppmax,singularity -with-singularity /sw/data/ToolBox/nf-core/nfcore-sarek-2.6.1.img --containerPath ~/sarek/containers --custom_config_base ~/configs-master/ --genome_base /sw/data/ToolBox/hg38bundle/ --project XXX --genome GRCh38 --step prepare_recalibration --target_bed Twist_Exome_RefSeq_targets_hg38.bed --input mapped_bam_files.tsv

Expected behaviour

Even if it might be better to do the joint genotyping on the full file anyway, I would expect the gvcf files generated to include (at least) the regions in the given bed file when using --target_bed. Or maybe just a note/warning on this in the description of --target_bed?

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware: HPC
  • Executor: slurm
  • Sarek version: 2.6.1

Nextflow Installation

  • Version: 20.10.0.5431

Container engine

  • Engine: Singularity
@jtangrot jtangrot added the bug Something isn't working label Feb 12, 2021
@FriederikeHanssen FriederikeHanssen added this to the 3.0 milestone May 11, 2022
@FriederikeHanssen
Copy link
Contributor

joint germline calling is partly rewritten and partly newly implemented in #595 to provide recalibrate joint germline files taking the target regions into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants