You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used Sarek to generate gVCF files with the tool HaplotypeCaller, and then planned to do joint genotyping myself on all samples together. As this is exome sequencing, I first used the option --target_bed, but realised that this results in lots of missing genotypes. The reason is that "bcftools isec" apparently is run on the gvcf files, which removes all regions where the start of a non-variant block in the gvcf is not within the regions listed in the bed file. This means that many of the regions with reference alleles are removed from the file, even if parts of these blocks are indeed covered by the bed (bcftools does not look at the END tag). The vcf files generated for each sample are fine though.
Even if it might be better to do the joint genotyping on the full file anyway, I would expect the gvcf files generated to include (at least) the regions in the given bed file when using --target_bed. Or maybe just a note/warning on this in the description of --target_bed?
Log files
Have you provided the following extra information/files:
The command used to run the pipeline
The .nextflow.log file
System
Hardware: HPC
Executor: slurm
Sarek version: 2.6.1
Nextflow Installation
Version: 20.10.0.5431
Container engine
Engine: Singularity
The text was updated successfully, but these errors were encountered:
joint germline calling is partly rewritten and partly newly implemented in #595 to provide recalibrate joint germline files taking the target regions into account.
Check Documentation
I have checked the following places for your error:
Description of the bug
I used Sarek to generate gVCF files with the tool HaplotypeCaller, and then planned to do joint genotyping myself on all samples together. As this is exome sequencing, I first used the option --target_bed, but realised that this results in lots of missing genotypes. The reason is that "bcftools isec" apparently is run on the gvcf files, which removes all regions where the start of a non-variant block in the gvcf is not within the regions listed in the bed file. This means that many of the regions with reference alleles are removed from the file, even if parts of these blocks are indeed covered by the bed (bcftools does not look at the END tag). The vcf files generated for each sample are fine though.
Steps to reproduce
Command line:
nextflow run ~/sarek/main.nf -profile uppmax,singularity -with-singularity /sw/data/ToolBox/nf-core/nfcore-sarek-2.6.1.img --containerPath ~/sarek/containers --custom_config_base ~/configs-master/ --genome_base /sw/data/ToolBox/hg38bundle/ --project XXX --genome GRCh38 --step prepare_recalibration --target_bed Twist_Exome_RefSeq_targets_hg38.bed --input mapped_bam_files.tsv
Expected behaviour
Even if it might be better to do the joint genotyping on the full file anyway, I would expect the gvcf files generated to include (at least) the regions in the given bed file when using --target_bed. Or maybe just a note/warning on this in the description of --target_bed?
Log files
Have you provided the following extra information/files:
.nextflow.log
fileSystem
Nextflow Installation
Container engine
The text was updated successfully, but these errors were encountered: