Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make -a a default argument for bcftools concat #1420

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

MatthiasZepper
Copy link
Member

I am running Sarek 3.4 for the first time, so maybe this is a classical example of "Just because you can do it with the pipeline, doesn't mean you should" 😉, but I found that the bfctools concat typically errors out for the samples, when trying to merge the germline vcf-files from each applied variant-caller (concatenate_vcfs : 'true').

Error executing process > 'NFCORE_SAREK:SAREK:POST_VARIANTCALLING:CONCATENATE_GERMLINE_VCFS:GERMLINE_VCFS_CONCAT (sample)'

Caused by:
  Process `NFCORE_SAREK:SAREK:POST_VARIANTCALLING:CONCATENATE_GERMLINE_VCFS:GERMLINE_VCFS_CONCAT (sample)` terminated with an error exit status (255)

Command executed:

  bcftools concat \
      --output sample.vcf.gz \
       \
      --threads 1 \
      sample.strelka.variants.added_info.vcf.gz sample.manta.diploid_sv.added_info.vcf.gz sample.freebayes.added_info.vcf.gz sample.tiddit.added_info.vcf.gz
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:POST_VARIANTCALLING:CONCATENATE_GERMLINE_VCFS:GERMLINE_VCFS_CONCAT":
      bcftools: $(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*$//')
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  Checking the headers and starting positions of 4 files
  [W::bcf_hdr_merge] Trying to combine "FT" tag definitions of different lengths
  Concatenating sample.strelka.variants.added_info.vcf.gz	14.703256 seconds
  Concatenating sample.manta.diploid_sv.added_info.vcf.gz
  The chromosome block chr1 is not contiguous, consider running with -a.

I looked up the effects of the -a / --alow-overlaps parameter and to me, it seems there is no harm being done by making it the default (unless maybe the performance?)

Therefore, I propose that this option becomes the default, or maybe even -a -D ? Since the latter is however more intrusive, that change might be over the top.

PR checklist

  • This comment contains a description of changes (with reason).

Copy link

github-actions bot commented Feb 22, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 4fa3628

+| ✅ 182 tests passed       |+
#| ❔  13 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in WorkflowSarek.groovy: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: conf/modules.config
  • files_exist - File is ignored: lib/WorkflowMain.groovy
  • files_exist - File is ignored: lib/NfcoreTemplate.groovy
  • files_exist - File is ignored: lib/WorkflowSarek.groovy
  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/sarek/sarek/.github/workflows/awstest.yml
  • template_strings - template_strings

✅ Tests passed:

Run details

  • nf-core/tools version 2.13
  • Run at 2024-02-23 13:05:59

@FriederikeHanssen
Copy link
Contributor

This sounds reasonable. Would be curious of @asp8200 experience. You added this feature for your group, right? any failures similar to this?

@asp8200
Copy link
Contributor

asp8200 commented Feb 23, 2024

This sounds reasonable. Would be curious of @asp8200 experience. You added this feature for your group, right? any failures similar to this?

I have no experience with using option concatenate_vcfs. I just implemented the thing, and - boy - was that a headache. In general, I would refrain from merging different kinds of vcf-files as it just seems bound to go wrong.

@MatthiasZepper
Copy link
Member Author

I have no experience with using option concatenate_vcfs. I just implemented the thing, and - boy - was that a headache. In general, I would refrain from merging different kinds of vcf-files as it just seems bound to go wrong.

Thank you for the reminder, I will heed it and am happy to hear, what your preferred variant caller (and maybe custom config options for them) are for a germline run on 30x WGS data?

At the moment, I am however not running the pipeline for any scientific purpose anyway, but because I am configuring it on our Seqera Platform instance. So essentially, I am applying as many tools as possible at once to make sure that everything works out of the box for the production team.

But I'd still argue that, even if it may not be advisable, combining the variants should work, if it is an option in the pipeline. In many cases, setting -a seems to be necessary for this step to succeed. Furthermore, a -resume at this point unfortunately reruns large parts of the pipeline, so it is a particularly annoying step to error out.

Copy link
Contributor

@asp8200 asp8200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxulysse
Copy link
Member

LGTM

@maxulysse maxulysse merged commit 0fa5bcb into nf-core:dev Feb 27, 2024
23 checks passed
@MatthiasZepper MatthiasZepper deleted the vcfs_concat_ext_args branch February 27, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants