Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds bwa -k 23 and GenomeChronicler as tool #16

Merged
merged 2 commits into from
Jan 10, 2020
Merged

Adds bwa -k 23 and GenomeChronicler as tool #16

merged 2 commits into from
Jan 10, 2020

Conversation

cgpu
Copy link
Owner

@cgpu cgpu commented Jan 10, 2020

  • Adds -k 23 (bwa mem seed length)
  • Exposes as params bwa_cpus, sort_cpus
  • Adds GenomeChronicler in tools (sarek logic)

nf-core/sarek pull request

Many thanks for contributing to nf-core/sarek!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: CONTRIBUTING.md

- [x] Adds -k 23 (bwa mem seed length)
- [x] Exposes as params bwa_cpus, sort_cpus
- [x] Adds GenomeChronicler in tools (sarek logic)
@cgpu cgpu merged commit d1e60c2 into master Jan 10, 2020
cgpu added a commit to PGP-UK/GenomeChronicler-Sarek-nf that referenced this pull request Jan 13, 2020
* nf-core bump-version . 2.5.1dev

* Remove PublishDirMode from test profile (nf-core#40)

* remove PublishDirMode from test profile

* update all tools

* minor updates + typo fix (nf-core#42)

* minor updates + typo fix

* fix VEP automated builds

* add location for abstracts

* remove reference to old buil.nf script

* update CHANGELOG

* Update docs/reference.md

Co-Authored-By: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>

* Update docs/reference.md

* Update docs/reference.md

* Worfklow (nf-core#45)

* Add workflow figure
* Include workflow figure in readme
* Update CHANGELOG

* add minimal genome and update some processes

* Start adding mouse data

* Update iGenomes.config

* Add tbi

* Drop ASCAT files

* apply changes from 2.5.1 to dev

* bump version to 2.5.2dev

* update CHANGELOG

* update tiddit to 2.8.1

* Use Version 98 of Mouse

* Add for grcm38

* Adjust mus musculus DB

* Annotation

* add smallerGRCh37 and minimalGRCh37

* use bwa aln when no knowIndels, otherwise use bwa mem, noIntervals currently in the process of being added everywhere

* don't use bwa aln

* add automatic generation of intervals file based on fastaFai file

* Adjusted genomes.config

* Should be list

* Set genomes_base to something

* Revert back

* enable CreateIntervalsBed for intervals_list from GATK Bundle

* Add proper calling list

* Use the bed file

* remove temp file

* update CHANGELOG

* Fix genome fa.fai

* Add in mgpv5

* Try short track

* Add in species handling

* Document new parameter species

* Add changelog

* Fix iGenomes stuff

* Add in note about GRCm38

* Fix small fai index issue

* Adjusted quotes in genomes.config

* And the same for igenomes

* Better folder structure for Mouse Genome Project data

* Minor adjustment to propoer paths

* Apply suggestions from code review

Add changes by Maxime

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* Remove space

* Move it up

* Update CHANGELOG.md

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* add minimal tests

* fix processes with no intervals

* add comments

* params noIntervals -> no_intervals

* sort genomes + add news

* code polishing

* update CHANGELOG

* add split_fastq params to split the fastq files with the splitFastq() nf method

* add tests

* temporarely remove TIDDIT tests

* add sention for bwa mem

* disable docker and singularity

* disable container

* add fastaFai for bwamem

* remove module samtools from label sentieon

* fix output from bwa mem

* fix output channel BamMapped from MapReads

* set params.sentieon to null by default

* add SentieonDedup process

* fix typo

* add fastaFai to SentieonDedup process

* fix bam indexing

* fix bam indexing

* fix bam indexing

* add SentieonBQSR

* add label sentieon to SentieonBQSR

* fix metrics output for SentieonBQSR

* increase cpus for Sentieon BQSR

* remove indexing

* add index for dedup

* bwa mem sentieon specific process

* TSV file for sentieon Dedup

* TSV for every step for Sentieon

* recal -> deduped

* fix input for TSV recalibrate

* enable restart from recalibrate with TSV with Sentieon

* fix sention variant calling from mapping and recalibrate

* code polishing

* add dump tag for imput sample

* add dump tag for bamDedupedSentieon

* code polishing

* code polishing

* code polishing

* code polishing

* remove when statement

* fix typo

* remove tsv for recalibrate with sentieon

* add dnascope dnaseq

* fix dnascope

* add TNscope process

* fix TNscope output

* add pon for TNscope

* add params.pon_index

* add annotation for sention DNAseq, DNAscope, TNscope

* add default pon_index

* typo

* fix typo

* improve automatic annotation

* typo

* typo

* add condition on when statement on TNscope

* clean up

* code polish

* add CODEOWNERS file

* add when statement on all sentieon processes with params.sentieon

* remove munin sentieon specific configs from config

* load sarek specific config

* update path to specific config

* update docs

* remove Freebayes

* update workflow image

* remove old logo

* fix tests

* add docs about params split_fastq

* update CHANGELOG

* improve docs

* more tests but less NF versions

* actually run the tests

* typo

* simplify configs

* add test for mpileup

* go crazy with tests

* fix tests

* includ test.config

* restore FreeBayes

* remove label memory_max from BaseRecalibrator process to fix nf-core#72

* add --skipQC all and --tools Manta,mpileup,Strelka to minimal genome tests

* update Nextflow version

* update Nextflow version

* update Nextflow

* add --step annotation to profile

* don't need to specify step here

* move params initalization

* add docs

* fix markdownlint

* more complete docs + sort genomes

* improve tests

* update docs

* update CHANGELOG

* improve script

* fix tests

* better comments

* better comments

* fix error on channel name

* fix output for MergeBamRecal

* fix MergeBamRecal output

* fix TSV file

* update comments and docs

* add warning for sentieon only processes

* nf-core bump-version . 2.5.2

* manual bump-version . 2.5.2

* update workflow image

* downgrade tools for release

* update CHANGELOG

* clean up and update workflow image

* allow a

* fix workflow image

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/output.md

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Reformats `bwa mem | samtools sort` command; WIP suboptimal resource usage

* Addresses #5 ;WIP

* Removes max_ resource alloc labels from MarkDuplicatesSpark

* Replaces .md.bam.bai->.md.bai (same as nf-core)

* Add ${markdup_java_options} to MarkDuplicatesSpark (same as nf-core MarkDuplicates)

* Changes MarkDuplicates --verbosity, DEBUG->INFO

* Changes intervalBed.simpleName->intervalBed.baseName; nf-cored

* Removes label cpus_1 from BaseRecalibratorSpark

* Remove cpus_2 labels from ApplyBQSRSpark; DEBUG->INFO

* Changes pseudo file "no_vepFile.txt" from https to s3 link

* Removes java options from ApplyBQSRSpark

* Removes java options from MarkDupesSpark

* Add java-options to MarkDupesSpark; verbosity INFO->ERROR

* Fixes dupe --java-options; 🤦

* Attempt to fix MarkDupesSpark; "--lower-case"->"-CAP"; Removed tmp

* Adds soft-coded allocation of resources to MapReads

* Initialise params for MapReads split resource alloc

* Adds neglected curlies around params

* Adds neglected \ to bash vars

* Adds neglected \ to bash vars

* WIP; MapReads optimisations

* Implement resource alloc between bwa and samtools

* Adds max, med soft coded resource alloc

* Re-labels processes (from hard coded resources to soft)

* Adds extra curlies to addrees priority of eval

* Add explicit declaration of maxForks/process

* Update med resource allocation function

* Add echo true and echo of  ${bwa_cpus} and ${sort_cpus}

* Hard code heap in MarkDuplicatesSpark at 8g

* Correct expected output bai in MarkDupes

* Removes Spark versions; Not stable with low resources

* Removes sorting; Picard might sort?

* Do not assume sorting in MarkDupes

* Adds explicit --ASSUME_SORT_ORDER unsorted

* Adds missing \\

* Omits -k 23

* Bringing sorted back

* Eliminating pipes in mapping step

* Adds bwa -k 23 and GenomeChronicler as tool (cgpu#16)

- [x] Adds -k 23 (bwa mem seed length)
- [x] Exposes as params bwa_cpus, sort_cpus
- [x] Adds GenomeChronicler in tools (sarek logic)

Co-authored-by: Alexander Peltzer <apeltzer@users.noreply.github.com>
Co-authored-by: Maxime Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>
cgpu added a commit to PGP-UK/GenomeChronicler-Sarek-nf that referenced this pull request Jan 15, 2020
* nf-core bump-version . 2.5.1dev

* Remove PublishDirMode from test profile (nf-core#40)

* remove PublishDirMode from test profile

* update all tools

* minor updates + typo fix (nf-core#42)

* minor updates + typo fix

* fix VEP automated builds

* add location for abstracts

* remove reference to old buil.nf script

* update CHANGELOG

* Update docs/reference.md

Co-Authored-By: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>

* Update docs/reference.md

* Update docs/reference.md

* Worfklow (nf-core#45)

* Add workflow figure
* Include workflow figure in readme
* Update CHANGELOG

* add minimal genome and update some processes

* Start adding mouse data

* Update iGenomes.config

* Add tbi

* Drop ASCAT files

* apply changes from 2.5.1 to dev

* bump version to 2.5.2dev

* update CHANGELOG

* update tiddit to 2.8.1

* Use Version 98 of Mouse

* Add for grcm38

* Adjust mus musculus DB

* Annotation

* add smallerGRCh37 and minimalGRCh37

* use bwa aln when no knowIndels, otherwise use bwa mem, noIntervals currently in the process of being added everywhere

* don't use bwa aln

* add automatic generation of intervals file based on fastaFai file

* Adjusted genomes.config

* Should be list

* Set genomes_base to something

* Revert back

* enable CreateIntervalsBed for intervals_list from GATK Bundle

* Add proper calling list

* Use the bed file

* remove temp file

* update CHANGELOG

* Fix genome fa.fai

* Add in mgpv5

* Try short track

* Add in species handling

* Document new parameter species

* Add changelog

* Fix iGenomes stuff

* Add in note about GRCm38

* Fix small fai index issue

* Adjusted quotes in genomes.config

* And the same for igenomes

* Better folder structure for Mouse Genome Project data

* Minor adjustment to propoer paths

* Apply suggestions from code review

Add changes by Maxime

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* Remove space

* Move it up

* Update CHANGELOG.md

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* add minimal tests

* fix processes with no intervals

* add comments

* params noIntervals -> no_intervals

* sort genomes + add news

* code polishing

* update CHANGELOG

* add split_fastq params to split the fastq files with the splitFastq() nf method

* add tests

* temporarely remove TIDDIT tests

* add sention for bwa mem

* disable docker and singularity

* disable container

* add fastaFai for bwamem

* remove module samtools from label sentieon

* fix output from bwa mem

* fix output channel BamMapped from MapReads

* set params.sentieon to null by default

* add SentieonDedup process

* fix typo

* add fastaFai to SentieonDedup process

* fix bam indexing

* fix bam indexing

* fix bam indexing

* add SentieonBQSR

* add label sentieon to SentieonBQSR

* fix metrics output for SentieonBQSR

* increase cpus for Sentieon BQSR

* remove indexing

* add index for dedup

* bwa mem sentieon specific process

* TSV file for sentieon Dedup

* TSV for every step for Sentieon

* recal -> deduped

* fix input for TSV recalibrate

* enable restart from recalibrate with TSV with Sentieon

* fix sention variant calling from mapping and recalibrate

* code polishing

* add dump tag for imput sample

* add dump tag for bamDedupedSentieon

* code polishing

* code polishing

* code polishing

* code polishing

* remove when statement

* fix typo

* remove tsv for recalibrate with sentieon

* add dnascope dnaseq

* fix dnascope

* add TNscope process

* fix TNscope output

* add pon for TNscope

* add params.pon_index

* add annotation for sention DNAseq, DNAscope, TNscope

* add default pon_index

* typo

* fix typo

* improve automatic annotation

* typo

* typo

* add condition on when statement on TNscope

* clean up

* code polish

* add CODEOWNERS file

* add when statement on all sentieon processes with params.sentieon

* remove munin sentieon specific configs from config

* load sarek specific config

* update path to specific config

* update docs

* remove Freebayes

* update workflow image

* remove old logo

* fix tests

* add docs about params split_fastq

* update CHANGELOG

* improve docs

* more tests but less NF versions

* actually run the tests

* typo

* simplify configs

* add test for mpileup

* go crazy with tests

* fix tests

* includ test.config

* restore FreeBayes

* remove label memory_max from BaseRecalibrator process to fix nf-core#72

* add --skipQC all and --tools Manta,mpileup,Strelka to minimal genome tests

* update Nextflow version

* update Nextflow version

* update Nextflow

* add --step annotation to profile

* don't need to specify step here

* move params initalization

* add docs

* fix markdownlint

* more complete docs + sort genomes

* improve tests

* update docs

* update CHANGELOG

* improve script

* fix tests

* better comments

* better comments

* fix error on channel name

* fix output for MergeBamRecal

* fix MergeBamRecal output

* fix TSV file

* update comments and docs

* add warning for sentieon only processes

* nf-core bump-version . 2.5.2

* manual bump-version . 2.5.2

* update workflow image

* downgrade tools for release

* update CHANGELOG

* clean up and update workflow image

* allow a

* fix workflow image

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/output.md

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Reformats `bwa mem | samtools sort` command; WIP suboptimal resource usage

* Addresses #5 ;WIP

* Removes max_ resource alloc labels from MarkDuplicatesSpark

* Replaces .md.bam.bai->.md.bai (same as nf-core)

* Add ${markdup_java_options} to MarkDuplicatesSpark (same as nf-core MarkDuplicates)

* Changes MarkDuplicates --verbosity, DEBUG->INFO

* Changes intervalBed.simpleName->intervalBed.baseName; nf-cored

* Removes label cpus_1 from BaseRecalibratorSpark

* Remove cpus_2 labels from ApplyBQSRSpark; DEBUG->INFO

* Changes pseudo file "no_vepFile.txt" from https to s3 link

* Removes java options from ApplyBQSRSpark

* Removes java options from MarkDupesSpark

* Add java-options to MarkDupesSpark; verbosity INFO->ERROR

* Fixes dupe --java-options; 🤦

* Attempt to fix MarkDupesSpark; "--lower-case"->"-CAP"; Removed tmp

* Adds soft-coded allocation of resources to MapReads

* Initialise params for MapReads split resource alloc

* Adds neglected curlies around params

* Adds neglected \ to bash vars

* Adds neglected \ to bash vars

* WIP; MapReads optimisations

* Implement resource alloc between bwa and samtools

* Adds max, med soft coded resource alloc

* Re-labels processes (from hard coded resources to soft)

* Adds extra curlies to addrees priority of eval

* Add explicit declaration of maxForks/process

* Update med resource allocation function

* Add echo true and echo of  ${bwa_cpus} and ${sort_cpus}

* Hard code heap in MarkDuplicatesSpark at 8g

* Correct expected output bai in MarkDupes

* Removes Spark versions; Not stable with low resources

* Removes sorting; Picard might sort?

* Do not assume sorting in MarkDupes

* Adds explicit --ASSUME_SORT_ORDER unsorted

* Adds missing \\

* Omits -k 23

* Bringing sorted back

* Eliminating pipes in mapping step

* Adds bwa -k 23 and GenomeChronicler as tool (cgpu#16)

- [x] Adds -k 23 (bwa mem seed length)
- [x] Exposes as params bwa_cpus, sort_cpus
- [x] Adds GenomeChronicler in tools (sarek logic)

Co-authored-by: Alexander Peltzer <apeltzer@users.noreply.github.com>
Co-authored-by: Maxime Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>

Co-authored-by: Alexander Peltzer <apeltzer@users.noreply.github.com>
Co-authored-by: Maxime Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>
cgpu added a commit to PGP-UK/GenomeChronicler-Sarek-nf that referenced this pull request Jan 21, 2020
* Absorbs latest nfcore and cgpu fork changes (#6)

* nf-core bump-version . 2.5.1dev

* Remove PublishDirMode from test profile (nf-core#40)

* remove PublishDirMode from test profile

* update all tools

* minor updates + typo fix (nf-core#42)

* minor updates + typo fix

* fix VEP automated builds

* add location for abstracts

* remove reference to old buil.nf script

* update CHANGELOG

* Update docs/reference.md

Co-Authored-By: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>

* Update docs/reference.md

* Update docs/reference.md

* Worfklow (nf-core#45)

* Add workflow figure
* Include workflow figure in readme
* Update CHANGELOG

* add minimal genome and update some processes

* Start adding mouse data

* Update iGenomes.config

* Add tbi

* Drop ASCAT files

* apply changes from 2.5.1 to dev

* bump version to 2.5.2dev

* update CHANGELOG

* update tiddit to 2.8.1

* Use Version 98 of Mouse

* Add for grcm38

* Adjust mus musculus DB

* Annotation

* add smallerGRCh37 and minimalGRCh37

* use bwa aln when no knowIndels, otherwise use bwa mem, noIntervals currently in the process of being added everywhere

* don't use bwa aln

* add automatic generation of intervals file based on fastaFai file

* Adjusted genomes.config

* Should be list

* Set genomes_base to something

* Revert back

* enable CreateIntervalsBed for intervals_list from GATK Bundle

* Add proper calling list

* Use the bed file

* remove temp file

* update CHANGELOG

* Fix genome fa.fai

* Add in mgpv5

* Try short track

* Add in species handling

* Document new parameter species

* Add changelog

* Fix iGenomes stuff

* Add in note about GRCm38

* Fix small fai index issue

* Adjusted quotes in genomes.config

* And the same for igenomes

* Better folder structure for Mouse Genome Project data

* Minor adjustment to propoer paths

* Apply suggestions from code review

Add changes by Maxime

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* Remove space

* Move it up

* Update CHANGELOG.md

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* add minimal tests

* fix processes with no intervals

* add comments

* params noIntervals -> no_intervals

* sort genomes + add news

* code polishing

* update CHANGELOG

* add split_fastq params to split the fastq files with the splitFastq() nf method

* add tests

* temporarely remove TIDDIT tests

* add sention for bwa mem

* disable docker and singularity

* disable container

* add fastaFai for bwamem

* remove module samtools from label sentieon

* fix output from bwa mem

* fix output channel BamMapped from MapReads

* set params.sentieon to null by default

* add SentieonDedup process

* fix typo

* add fastaFai to SentieonDedup process

* fix bam indexing

* fix bam indexing

* fix bam indexing

* add SentieonBQSR

* add label sentieon to SentieonBQSR

* fix metrics output for SentieonBQSR

* increase cpus for Sentieon BQSR

* remove indexing

* add index for dedup

* bwa mem sentieon specific process

* TSV file for sentieon Dedup

* TSV for every step for Sentieon

* recal -> deduped

* fix input for TSV recalibrate

* enable restart from recalibrate with TSV with Sentieon

* fix sention variant calling from mapping and recalibrate

* code polishing

* add dump tag for imput sample

* add dump tag for bamDedupedSentieon

* code polishing

* code polishing

* code polishing

* code polishing

* remove when statement

* fix typo

* remove tsv for recalibrate with sentieon

* add dnascope dnaseq

* fix dnascope

* add TNscope process

* fix TNscope output

* add pon for TNscope

* add params.pon_index

* add annotation for sention DNAseq, DNAscope, TNscope

* add default pon_index

* typo

* fix typo

* improve automatic annotation

* typo

* typo

* add condition on when statement on TNscope

* clean up

* code polish

* add CODEOWNERS file

* add when statement on all sentieon processes with params.sentieon

* remove munin sentieon specific configs from config

* load sarek specific config

* update path to specific config

* update docs

* remove Freebayes

* update workflow image

* remove old logo

* fix tests

* add docs about params split_fastq

* update CHANGELOG

* improve docs

* more tests but less NF versions

* actually run the tests

* typo

* simplify configs

* add test for mpileup

* go crazy with tests

* fix tests

* includ test.config

* restore FreeBayes

* remove label memory_max from BaseRecalibrator process to fix nf-core#72

* add --skipQC all and --tools Manta,mpileup,Strelka to minimal genome tests

* update Nextflow version

* update Nextflow version

* update Nextflow

* add --step annotation to profile

* don't need to specify step here

* move params initalization

* add docs

* fix markdownlint

* more complete docs + sort genomes

* improve tests

* update docs

* update CHANGELOG

* improve script

* fix tests

* better comments

* better comments

* fix error on channel name

* fix output for MergeBamRecal

* fix MergeBamRecal output

* fix TSV file

* update comments and docs

* add warning for sentieon only processes

* nf-core bump-version . 2.5.2

* manual bump-version . 2.5.2

* update workflow image

* downgrade tools for release

* update CHANGELOG

* clean up and update workflow image

* allow a

* fix workflow image

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/output.md

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Reformats `bwa mem | samtools sort` command; WIP suboptimal resource usage

* Addresses #5 ;WIP

* Removes max_ resource alloc labels from MarkDuplicatesSpark

* Replaces .md.bam.bai->.md.bai (same as nf-core)

* Add ${markdup_java_options} to MarkDuplicatesSpark (same as nf-core MarkDuplicates)

* Changes MarkDuplicates --verbosity, DEBUG->INFO

* Changes intervalBed.simpleName->intervalBed.baseName; nf-cored

* Removes label cpus_1 from BaseRecalibratorSpark

* Remove cpus_2 labels from ApplyBQSRSpark; DEBUG->INFO

* Changes pseudo file "no_vepFile.txt" from https to s3 link

* Removes java options from ApplyBQSRSpark

* Removes java options from MarkDupesSpark

* Add java-options to MarkDupesSpark; verbosity INFO->ERROR

* Fixes dupe --java-options; 🤦

* Attempt to fix MarkDupesSpark; "--lower-case"->"-CAP"; Removed tmp

* Adds soft-coded allocation of resources to MapReads

* Initialise params for MapReads split resource alloc

* Adds neglected curlies around params

* Adds neglected \ to bash vars

* Adds neglected \ to bash vars

* WIP; MapReads optimisations

* Implement resource alloc between bwa and samtools

* Adds max, med soft coded resource alloc

* Re-labels processes (from hard coded resources to soft)

* Adds extra curlies to addrees priority of eval

* Add explicit declaration of maxForks/process

* Update med resource allocation function

* Add echo true and echo of  ${bwa_cpus} and ${sort_cpus}

* Hard code heap in MarkDuplicatesSpark at 8g

* Correct expected output bai in MarkDupes

* Removes Spark versions; Not stable with low resources

* Removes sorting; Picard might sort?

* Do not assume sorting in MarkDupes

* Adds explicit --ASSUME_SORT_ORDER unsorted

* Adds missing \\

* Omits -k 23

* Bringing sorted back

* Eliminating pipes in mapping step

* Adds bwa -k 23 and GenomeChronicler as tool (cgpu#16)

- [x] Adds -k 23 (bwa mem seed length)
- [x] Exposes as params bwa_cpus, sort_cpus
- [x] Adds GenomeChronicler in tools (sarek logic)

Co-authored-by: Alexander Peltzer <apeltzer@users.noreply.github.com>
Co-authored-by: Maxime Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>

* Replaces intervals process to be flexible

* Updated nextflow.config with new intervals process

* Updates conf/base.config; Removes dynamic resource alloc

Co-authored-by: Alexander Peltzer <apeltzer@users.noreply.github.com>
Co-authored-by: Maxime Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>
cgpu added a commit to PGP-UK/GenomeChronicler-Sarek-nf that referenced this pull request Jan 22, 2020
* nf-core bump-version . 2.5.1dev

* Remove PublishDirMode from test profile (nf-core#40)

* remove PublishDirMode from test profile

* update all tools

* minor updates + typo fix (nf-core#42)

* minor updates + typo fix

* fix VEP automated builds

* add location for abstracts

* remove reference to old buil.nf script

* update CHANGELOG

* Update docs/reference.md

Co-Authored-By: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>

* Update docs/reference.md

* Update docs/reference.md

* Worfklow (nf-core#45)

* Add workflow figure
* Include workflow figure in readme
* Update CHANGELOG

* add minimal genome and update some processes

* Start adding mouse data

* Update iGenomes.config

* Add tbi

* Drop ASCAT files

* apply changes from 2.5.1 to dev

* bump version to 2.5.2dev

* update CHANGELOG

* update tiddit to 2.8.1

* Use Version 98 of Mouse

* Add for grcm38

* Adjust mus musculus DB

* Annotation

* add smallerGRCh37 and minimalGRCh37

* use bwa aln when no knowIndels, otherwise use bwa mem, noIntervals currently in the process of being added everywhere

* don't use bwa aln

* add automatic generation of intervals file based on fastaFai file

* Adjusted genomes.config

* Should be list

* Set genomes_base to something

* Revert back

* enable CreateIntervalsBed for intervals_list from GATK Bundle

* Add proper calling list

* Use the bed file

* remove temp file

* update CHANGELOG

* Fix genome fa.fai

* Add in mgpv5

* Try short track

* Add in species handling

* Document new parameter species

* Add changelog

* Fix iGenomes stuff

* Add in note about GRCm38

* Fix small fai index issue

* Adjusted quotes in genomes.config

* And the same for igenomes

* Better folder structure for Mouse Genome Project data

* Minor adjustment to propoer paths

* Apply suggestions from code review

Add changes by Maxime

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* Remove space

* Move it up

* Update CHANGELOG.md

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>

* add minimal tests

* fix processes with no intervals

* add comments

* params noIntervals -> no_intervals

* sort genomes + add news

* code polishing

* update CHANGELOG

* add split_fastq params to split the fastq files with the splitFastq() nf method

* add tests

* temporarely remove TIDDIT tests

* add sention for bwa mem

* disable docker and singularity

* disable container

* add fastaFai for bwamem

* remove module samtools from label sentieon

* fix output from bwa mem

* fix output channel BamMapped from MapReads

* set params.sentieon to null by default

* add SentieonDedup process

* fix typo

* add fastaFai to SentieonDedup process

* fix bam indexing

* fix bam indexing

* fix bam indexing

* add SentieonBQSR

* add label sentieon to SentieonBQSR

* fix metrics output for SentieonBQSR

* increase cpus for Sentieon BQSR

* remove indexing

* add index for dedup

* bwa mem sentieon specific process

* TSV file for sentieon Dedup

* TSV for every step for Sentieon

* recal -> deduped

* fix input for TSV recalibrate

* enable restart from recalibrate with TSV with Sentieon

* fix sention variant calling from mapping and recalibrate

* code polishing

* add dump tag for imput sample

* add dump tag for bamDedupedSentieon

* code polishing

* code polishing

* code polishing

* code polishing

* remove when statement

* fix typo

* remove tsv for recalibrate with sentieon

* add dnascope dnaseq

* fix dnascope

* add TNscope process

* fix TNscope output

* add pon for TNscope

* add params.pon_index

* add annotation for sention DNAseq, DNAscope, TNscope

* add default pon_index

* typo

* fix typo

* improve automatic annotation

* typo

* typo

* add condition on when statement on TNscope

* clean up

* code polish

* add CODEOWNERS file

* add when statement on all sentieon processes with params.sentieon

* remove munin sentieon specific configs from config

* load sarek specific config

* update path to specific config

* update docs

* remove Freebayes

* update workflow image

* remove old logo

* fix tests

* add docs about params split_fastq

* update CHANGELOG

* improve docs

* more tests but less NF versions

* actually run the tests

* typo

* simplify configs

* add test for mpileup

* go crazy with tests

* fix tests

* includ test.config

* restore FreeBayes

* remove label memory_max from BaseRecalibrator process to fix nf-core#72

* add --skipQC all and --tools Manta,mpileup,Strelka to minimal genome tests

* update Nextflow version

* update Nextflow version

* update Nextflow

* add --step annotation to profile

* don't need to specify step here

* move params initalization

* add docs

* fix markdownlint

* more complete docs + sort genomes

* improve tests

* update docs

* update CHANGELOG

* improve script

* fix tests

* better comments

* better comments

* fix error on channel name

* fix output for MergeBamRecal

* fix MergeBamRecal output

* fix TSV file

* update comments and docs

* add warning for sentieon only processes

* nf-core bump-version . 2.5.2

* manual bump-version . 2.5.2

* update workflow image

* downgrade tools for release

* update CHANGELOG

* clean up and update workflow image

* allow a

* fix workflow image

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/output.md

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Reformats `bwa mem | samtools sort` command; WIP suboptimal resource usage

* Addresses #5 ;WIP

* Removes max_ resource alloc labels from MarkDuplicatesSpark

* Replaces .md.bam.bai->.md.bai (same as nf-core)

* Add ${markdup_java_options} to MarkDuplicatesSpark (same as nf-core MarkDuplicates)

* Changes MarkDuplicates --verbosity, DEBUG->INFO

* Changes intervalBed.simpleName->intervalBed.baseName; nf-cored

* Removes label cpus_1 from BaseRecalibratorSpark

* Remove cpus_2 labels from ApplyBQSRSpark; DEBUG->INFO

* Changes pseudo file "no_vepFile.txt" from https to s3 link

* Removes java options from ApplyBQSRSpark

* Removes java options from MarkDupesSpark

* Add java-options to MarkDupesSpark; verbosity INFO->ERROR

* Fixes dupe --java-options; 🤦

* Attempt to fix MarkDupesSpark; "--lower-case"->"-CAP"; Removed tmp

* Adds soft-coded allocation of resources to MapReads

* Initialise params for MapReads split resource alloc

* Adds neglected curlies around params

* Adds neglected \ to bash vars

* Adds neglected \ to bash vars

* WIP; MapReads optimisations

* Implement resource alloc between bwa and samtools

* Adds max, med soft coded resource alloc

* Re-labels processes (from hard coded resources to soft)

* Adds extra curlies to addrees priority of eval

* Add explicit declaration of maxForks/process

* Update med resource allocation function

* Add echo true and echo of  ${bwa_cpus} and ${sort_cpus}

* Hard code heap in MarkDuplicatesSpark at 8g

* Correct expected output bai in MarkDupes

* Removes Spark versions; Not stable with low resources

* Removes sorting; Picard might sort?

* Do not assume sorting in MarkDupes

* Adds explicit --ASSUME_SORT_ORDER unsorted

* Adds missing \\

* Omits -k 23

* Bringing sorted back

* Eliminating pipes in mapping step

* Adds bwa -k 23 and GenomeChronicler as tool (cgpu#16)

- [x] Adds -k 23 (bwa mem seed length)
- [x] Exposes as params bwa_cpus, sort_cpus
- [x] Adds GenomeChronicler in tools (sarek logic)

Co-authored-by: Alexander Peltzer <apeltzer@users.noreply.github.com>
Co-authored-by: Maxime Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: Szilveszter Juhos <szilveszter.juhos@scilifelab.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant