Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with optional and mandatory GATK resource files #592

Merged
merged 22 commits into from
Jun 20, 2022

Conversation

FriederikeHanssen
Copy link
Contributor

@FriederikeHanssen FriederikeHanssen commented Jun 16, 2022

Closes #299 , #359 , #367

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Jun 16, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 2aa1e2f

+| ✅ 144 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   8 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • schema_description - No description provided in schema for parameter: umi_read_structure
  • schema_description - No description provided in schema for parameter: group_by_umi_strategy

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.4.1
  • Run at 2022-06-20 12:58:46

@FriederikeHanssen
Copy link
Contributor Author

@maxulysse still unsure on how to deal with known_sites (see nextflow slack). For the rest, would love your opinion. I am not super happy with prepare_genome.nf, however I coudln't find another work around. If you know of some nextflow magic there, i would be all 👂

@FriederikeHanssen
Copy link
Contributor Author

@maxulysse carefully testing now all combinations again, but so far so good.

@FriederikeHanssen FriederikeHanssen marked this pull request as ready for review June 17, 2022 13:34
@FriederikeHanssen
Copy link
Contributor Author

Tested the following and all works/fails as expectd:

BQSR/FiltervariantTranches: with known_sites, just dbsnp, just known_indels
Haplotypecaller: with/without dbsnp
Mutect/Getpileupsummaries: with and without germline_resource
Getpileupsummaries: without intervals uses germline_resource

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

@maxulysse
Copy link
Member

Can you do the channel.value magic for all of the reference files?

@FriederikeHanssen
Copy link
Contributor Author

yes should be able to, the only exception is germline_resource_tbi because I need it to check later whether getpileup summaries should be run. I couldn't get naything else to work withChannel.value([]), tried Channel.ifEmpty , germline_resource_tbi ?: etc.

@FriederikeHanssen
Copy link
Contributor Author

For the fasta also no, because it is mandatory, so it should be empty (however this is also caught by the input validation)

@FriederikeHanssen
Copy link
Contributor Author

Mutect needs fixing

@FriederikeHanssen
Copy link
Contributor Author

Not sure what is going on with mutect2. Locally results files are there:

results
├── csv
│   └── variantcalled.csv
├── multiqc
│   ├── multiqc_data
│   │   ├── mqc_bcftools-stats-subtypes_1.txt
│   │   ├── mqc_bcftools_stats_depth_1.txt
│   │   ├── mqc_bcftools_stats_indel-lengths_1.txt
│   │   ├── mqc_bcftools_stats_vqc_Count_Indels.txt
│   │   ├── mqc_bcftools_stats_vqc_Count_SNP.txt
│   │   ├── mqc_bcftools_stats_vqc_Count_Transitions.txt
│   │   ├── mqc_bcftools_stats_vqc_Count_Transversions.txt
│   │   ├── mqc_vcftools_tstv_by_count_1.txt
│   │   ├── mqc_vcftools_tstv_by_qual_1.txt
│   │   ├── multiqc.log
│   │   ├── multiqc_bcftools_stats.txt
│   │   ├── multiqc_citations.txt
│   │   ├── multiqc_data.json
│   │   ├── multiqc_general_stats.txt
│   │   ├── multiqc_sources.txt
│   │   ├── vcftools_tstv_by_count.txt
│   │   └── vcftools_tstv_by_qual.txt
│   ├── multiqc_plots
│   │   ├── pdf
│   │   │   ├── mqc_bcftools-stats-subtypes_1.pdf
│   │   │   ├── mqc_bcftools-stats-subtypes_1_pc.pdf
│   │   │   ├── mqc_bcftools_stats_depth_1.pdf
│   │   │   ├── mqc_bcftools_stats_indel-lengths_1.pdf
│   │   │   ├── mqc_bcftools_stats_vqc_Count_Indels.pdf
│   │   │   ├── mqc_bcftools_stats_vqc_Count_SNP.pdf
│   │   │   ├── mqc_bcftools_stats_vqc_Count_Transitions.pdf
│   │   │   ├── mqc_bcftools_stats_vqc_Count_Transversions.pdf
│   │   │   ├── mqc_vcftools_tstv_by_count_1.pdf
│   │   │   └── mqc_vcftools_tstv_by_qual_1.pdf
│   │   ├── png
│   │   │   ├── mqc_bcftools-stats-subtypes_1.png
│   │   │   ├── mqc_bcftools-stats-subtypes_1_pc.png
│   │   │   ├── mqc_bcftools_stats_depth_1.png
│   │   │   ├── mqc_bcftools_stats_indel-lengths_1.png
│   │   │   ├── mqc_bcftools_stats_vqc_Count_Indels.png
│   │   │   ├── mqc_bcftools_stats_vqc_Count_SNP.png
│   │   │   ├── mqc_bcftools_stats_vqc_Count_Transitions.png
│   │   │   ├── mqc_bcftools_stats_vqc_Count_Transversions.png
│   │   │   ├── mqc_vcftools_tstv_by_count_1.png
│   │   │   └── mqc_vcftools_tstv_by_qual_1.png
│   │   └── svg
│   │       ├── mqc_bcftools-stats-subtypes_1.svg
│   │       ├── mqc_bcftools-stats-subtypes_1_pc.svg
│   │       ├── mqc_bcftools_stats_depth_1.svg
│   │       ├── mqc_bcftools_stats_indel-lengths_1.svg
│   │       ├── mqc_bcftools_stats_vqc_Count_Indels.svg
│   │       ├── mqc_bcftools_stats_vqc_Count_SNP.svg
│   │       ├── mqc_bcftools_stats_vqc_Count_Transitions.svg
│   │       ├── mqc_bcftools_stats_vqc_Count_Transversions.svg
│   │       ├── mqc_vcftools_tstv_by_count_1.svg
│   │       └── mqc_vcftools_tstv_by_qual_1.svg
│   └── multiqc_report.html
├── pipeline_info
│   ├── execution_report_2022-06-20_08-48-17.html
│   ├── execution_timeline_2022-06-20_08-48-17.html
│   ├── execution_trace_2022-06-20_08-48-17.txt
│   ├── pipeline_dag_2022-06-20_08-48-17.html
│   └── software_versions.yml
├── reports
│   ├── bcftools
│   │   └── sample4_vs_sample3.filtered.bcftools_stats.txt
│   └── vcftools
│       ├── sample4_vs_sample3.filtered.FILTER.summary
│       ├── sample4_vs_sample3.filtered.TsTv.count
│       └── sample4_vs_sample3.filtered.TsTv.qual
├── untar
│   └── chromosomes
│       └── chr21.fasta
└── variant_calling
    └── sample4_vs_sample3
        └── mutect2
            ├── sample3.pileupsummaries.table
            ├── sample4.pileupsummaries.table
            ├── sample4_vs_sample3.artifactprior.tar.gz
            ├── sample4_vs_sample3.contamination.table
            ├── sample4_vs_sample3.filtered.vcf.gz
            ├── sample4_vs_sample3.filtered.vcf.gz.filteringStats.tsv
            ├── sample4_vs_sample3.filtered.vcf.gz.tbi
            ├── sample4_vs_sample3.segmentation.table
            ├── sample4_vs_sample3.vcf.gz
            ├── sample4_vs_sample3.vcf.gz.stats
            └── sample4_vs_sample3.vcf.gz.tbi

conf/modules.config Outdated Show resolved Hide resolved
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
@FriederikeHanssen
Copy link
Contributor Author

GetPileupSummariesNormal fails with a memory error. Wasn't an issue before because it wasn't run. Locally this works because the job gets automatically resubmitted with a higher memory request 🥲 .

@FriederikeHanssen
Copy link
Contributor Author

locally it didn't work with 7.5GB of memory. I am afraid that mutect test just won't run on GHA

@maxulysse
Copy link
Member

locally it didn't work with 7.5GB of memory. I am afraid that mutect test just won't run on GHA

Too bad, then, let's run that test locally only whenever needed.
I'm assuming we already have input data as small as possible?

@FriederikeHanssen
Copy link
Contributor Author

yep Gavin downsampled it as much as he could but the somatic GATK tools require a certain amount of SNPs to run at all. If we further reduce they'll fail for other reasons and much more upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants