Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of parameters #817

Closed
wants to merge 9 commits into from

Conversation

KevinMenden
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated

This PR adds the Validation class to the lib/ to validate parameters against the JSON schema.

  • checks whether parameters adhere to types specified in schema
  • where applicable checks the schema pattern
  • checks for core nextflow parameters and exits when accidentally specified (e.g. --profile)
  • issues warnings for unexpected parameters

Have tested with cageseq and rnaseq so far, but should test a bit more. Ideas for more fancy checks are welcome :-)
Will also add some tests for this function to tools.

@ewels
Copy link
Member

ewels commented Dec 16, 2020

Code review ongoing on the cageseq PR: nf-core/cageseq#38

@KevinMenden
Copy link
Contributor Author

putting back to draft for now

@KevinMenden KevinMenden marked this pull request as ready for review January 28, 2021 08:04
@KevinMenden
Copy link
Contributor Author

Would be nice to get some comments on this so we can get this over and done with soon :-)
It now ships with all the extra dependencies in lib/ which seemed to be the only sensible solution to allow offline usage.

So far I have created an extra Validation.groovy class. Considering that it's schema validation, we might stick this functionality to the Schema class though, what do you think?

@KevinMenden KevinMenden added DSL2 schema template nf-core pipeline/component template labels Jan 28, 2021
@apeltzer
Copy link
Member

apeltzer commented Feb 5, 2021

So far I have created an extra Validation.groovy class. Considering that it's schema validation, we might stick this functionality to the Schema class though, what do you think?

That seems indeed reasonable to me 👍🏻

@apeltzer
Copy link
Member

apeltzer commented Feb 8, 2021

Had a look at this and if you could push this over to Schema.groovy, I think this already looks quite fine. Happy to test this as well, as I'm having another pipeline open where I implemented the lib/ stuff now following the rnaseq example 👍🏻

(its still DSLv1, but hey it works!) nf-core/eager#680

@KevinMenden
Copy link
Contributor Author

Thanks @apeltzer ! I'll switch it over to Schema sometime today. Would be amazing if you could test it a bit :)

@KevinMenden
Copy link
Contributor Author

Did the refactoring now and resolved the merge conflicts - should be good for some testing! Will also do some testing with this in the coming days.

@KevinMenden
Copy link
Contributor Author

KevinMenden commented Feb 11, 2021

Okay I had to fix some bugs in the template. Took me longer than I'd like to admit to realize that I tried to import a Schema class into the definition of the Schema class, but aliasing solved this.

I tested it with nf-core create ... and then running with some test data and everything works. From the template side it should be fine now.

@apeltzer
Copy link
Member

Hi @KevinMenden !
gave this a go over at nf-core/eager#680 and added this to my (anyways open) PR there. First look it works fine, but I have some questions 😆 :

Nice catching these typical beginner mistakes:

nextflow run . -profile test_tsv,docker --fake_james -resume --profile 
N E X T F L O W  ~  version 20.10.0
Launching `./main.nf` [backstabbing_volta] - revision: 7882df0b76
ERROR: You used a core Nextflow option with two hyphens: --profile! Please resubmit with one.

But it doesn't warn me about the --fake_james parameter that I just made up and isn't in the schema. It correctly found an issue in our schema, that a default wasn't listed in the enum type:

https://github.com/nf-core/eager/blob/4c784e84450f308d8bd7e9b415984ea651be8e97/nextflow_schema.json#L644-L651

Wondering whether at least the non-detection of the --fake_james is anticipated or not?

@KevinMenden
Copy link
Contributor Author

Alright so could you run this again, and then just ctrl+c the workflow? We've decided to only warn about unexpected parameters if a workflow fails. So your --fake_james should be captured but you wouldn't be warned. If you break the workflow it should warn you 🤞

@apeltzer
Copy link
Member

Seems not to work that way:

 nextflow run . -profile test_tsv,docker --fake_james -resume
N E X T F L O W  ~  version 20.10.0
Launching `./main.nf` [astonishing_kalman] - revision: 7882df0b76

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/eager v2.3.2dev
------------------------------------------------------

Core Nextflow options
    runName                   : astonishing_kalman
    containerEngine           : docker
    container                 : nfcore/eager:dev
    launchDir                 : /Users/alexanderpeltzer/IDEA/nf-core/eager
    workDir                   : /Users/alexanderpeltzer/IDEA/nf-core/eager/work
    projectDir                : /Users/alexanderpeltzer/IDEA/nf-core/eager
    userName                  : alexanderpeltzer
    profile                   : test_tsv,docker
    configFiles               : /Users/alexanderpeltzer/.nextflow/config, /Users/alexanderpeltzer/IDEA/nf-core/eager/nextflow.config

Input/output options
    input                     : https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv

Reference genome options
    fasta                     : https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta
    genome                    : false

Generic options
    name                      : false
    email                     : false
    email_on_fail             : false
    max_multiqc_email_size    : 25 MB
    multiqc_config            : false
    validate_params           : true

Max job request options
    max_cpus                  : 2
    max_memory                : 6 GB
    max_time                  : 2d

Institutional config options
    config_profile_description: Minimal test dataset to check pipeline function
    config_profile_contact    : false
    config_profile_url        : false

Genotyping
    gatk_ug_keep_realign_bam  : false

Metagenomic Screening
    metagenomic_tool          :

------------------------------------------------------

If you use nf-core/eager for your analysis please cite:

* The pipeline
  https://doi.org/10.1101/2020.06.11.145615

* The nf-core framework
  https://dx.doi.org/10.1038/s41587-020-0439-x
  https://rdcu.be/b1GjZ

* Software dependencies
  https://github.com/nf-core/eager/blob/master/CITATIONS.md

------------------------------------------------------
Schaffa, Schaffa, Genome Baua!
Monitor the execution with Nextflow Tower using this url https://tower.nf/watch/58X1ggHH97wQBc
Monitor the execution with Nextflow Tower using this url https://tower.nf/watch/58X1ggHH97wQBc
executor >  local (1)
Monitor the execution with Nextflow Tower using this url https://tower.nf/watch/58X1ggHH97wQBc
executor >  local (1)
[cd/d3a126] process > makeBWAIndex (Mammoth_MT_Kr... [100%] 1 of 1, cached: 1 ✔
[b2/08ed61] process > makeFastaIndex (Mammoth_MT_... [100%] 1 of 1, cached: 1 ✔
[17/bd269e] process > makeSeqDict (Mammoth_MT_Kra... [100%] 1 of 1, cached: 1 ✔
[-        ] process > convertBam                     -
[-        ] process > indexinputbam                  -
[60/04a558] process > fastqc (JK2802_L2)             [100%] 2 of 2, cached: 2 ✔
[-        ] process > fastp                          -
[9b/408cc0] process > adapter_removal (JK2782_L1)    [100%] 2 of 2, cached: 2 ✔
[-        ] process > lanemerge                      -
[-        ] process > lanemerge_hostremoval_fastq    -
[-        ] process > fastqc_after_clipping          -
[-        ] process > bwa                            -
[-        ] process > bwamem                         -
[-        ] process > circulargenerator              -
[-        ] process > circularmapper                 -
[-        ] process > bowtie2                        -
[-        ] process > hostremoval_input_fastq        -
[-        ] process > seqtype_merge                  -
[-        ] process > samtools_flagstat              -
[-        ] process > samtools_filter                -
[-        ] process > samtools_flagstat_after_filter -
[-        ] process > endorSpy                       -
[-        ] process > dedup                          -
[-        ] process > markduplicates                 -
[-        ] process > library_merge                  -
[-        ] process > preseq                         -
[-        ] process > bedtools                       -
[-        ] process > damageprofiler                 -
[-        ] process > mapdamage_rescaling            -
[-        ] process > pmdtools                       -
[-        ] process > bam_trim                       -
[-        ] process > additional_library_merge       -
[-        ] process > qualimap                       -
[-        ] process > genotyping_ug                  -
[-        ] process > genotyping_hc                  -
[-        ] process > genotyping_freebayes           -
[-        ] process > genotyping_pileupcaller        -
[-        ] process > eigenstrat_snp_coverage        -
[-        ] process > genotyping_angsd               -
[-        ] process > vcf2genome                     -
[-        ] process > multivcfanalyzer               -
[-        ] process > mtnucratio                     -
[-        ] process > sex_deterrmine                 -
[-        ] process > nuclear_contamination          -
[-        ] process > print_nuclear_contamination    -
[-        ] process > metagenomic_complexity_filter  -
[-        ] process > malt                           -
[-        ] process > maltextract                    -
[-        ] process > kraken                         -
[-        ] process > kraken_parse                   -
[-        ] process > kraken_merge                   -
[f3/c2d1ed] process > output_documentation           [100%] 1 of 1, cached: 1 ✔
[cc/05b1aa] process > get_software_versions          [  0%] 0 of 1
[-        ] process > multiqc                        -
-[nf-core/eager] Pipeline completed with errors-
WARN: Killing pending tasks (1)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

only hit CTRL + C once, so not force killing it... this is on a mac with docker ?

@KevinMenden
Copy link
Contributor Author

KevinMenden commented Feb 12, 2021

No kill it for good!

Schaffa, Schaffa, Genome Baua!
😆

@apeltzer
Copy link
Member

I forgot a piece of code in main ;-)

@apeltzer
Copy link
Member

[-        ] process > multiqc                                  -
WARN: Unexpected parameter: genomes
WARN: Unexpected parameter: config_profile_name
WARN: Unexpected parameter: fake_james
WARN: Unexpected parameter: save-aligned-intermediates
WARN: Unexpected parameter: save-trimmed
-[nf-core/eager] Pipeline completed with errors-
[9b/408cc0] Cached process > adapter_removal (JK2782_L1)
[60/04a558] Cached process > fastqc (JK2802_L2)
[2b/44c3bc] Cached process > fastqc (JK2782_L1)
[92/4d3343] Cached process > adapter_removal (JK2802_L2)
WARN: Killing pending tasks (1)
One more CTRL+C to force exit

@KevinMenden
Copy link
Contributor Author

Ah - yes the one at the end I assume. Should've thought of that!
Awesome then it works. We can always discuss whether we want those warnings more prominent I guess, but for now this seems to work as intended.

@ewels
Copy link
Member

ewels commented Feb 12, 2021

Nice! I get this with a vanilla template:

image

I guess we should add these parameters to the schema (hidden) so that they don't trigger a warning..

@ewels
Copy link
Member

ewels commented Feb 12, 2021

Closing in favour of #852

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema template nf-core pipeline/component template
Projects
None yet
3 participants