Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON schema validation error with 21.10.0-SNAPSHOT #1304

Closed
drpatelh opened this issue Oct 28, 2021 · 12 comments
Closed

JSON schema validation error with 21.10.0-SNAPSHOT #1304

drpatelh opened this issue Oct 28, 2021 · 12 comments
Labels
bug Something isn't working

Comments

@drpatelh
Copy link
Member

Copied across description from nextflow-io/nextflow#2418

In the latest snapshot release, the parameter JSON schema validation we have in nf-core pipelines seems to be failing before the pipeline execution as a result of a mismatch between the observed and expected objects:

$ NXF_VER=21.10.0-SNAPSHOT nextflow run nf-core/rnaseq -profile test,singularity

ERROR: Validation of pipeline parameters failed!

* --hostnames: expected type: String, found: JSONObject ({"cfc":[".hpc.uni-tuebingen.de"],"utd_sysbio":["sysbio.utdallas.edu"],"utd_ganymede":["ganymede.utdallas.edu"],"genouest":[".genouest.org"],"cbe":[".cbe.vbc.ac.at"],"genotoul":[".genologin1.toulouse.inra.fr",".genologin2.toulouse.inra.fr"],"crick":[".thecrick.org"],"uppmax":[".uppmax.uu.se"],"icr_davros":[".davros.compute.estate"],"imperial":[".hpc.ic.ac.uk"],"binac":[".binac.uni-tuebingen.de"],"imperial_mb":[".hpc.ic.ac.uk"]})

Changing this line to "type": "object" in the parameter schema fixes it. However, this means that the latest version of NF will not be compatible with older versions of nf-core pipelines. This is very likely due to poor patch fixing on the nf-core side to get things working but it would be good to find a workaround.

The error is being raised from NfcoreSchema.groovy

@drpatelh drpatelh added the bug Something isn't working label Oct 28, 2021
@pditommaso
Copy link

There's too little info. Can you modify this line

https://github.com/nf-core/rnaseq/blob/964425e3fd8bfc3dc7bce43279a98d17a874d3f7/lib/NfcoreSchema.groovy#L156

to

 log.error 'ERROR: Validation of pipeline parameters failed!', e

It should print the full error trace in the log file

@drpatelh
Copy link
Member Author

Oct-28 17:41:02.756 [main] DEBUG nextflow.cli.Launcher - $> nextflow run main.nf -profile test,singularity -c custom.config -resume
Oct-28 17:41:02.802 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 21.10.0-SNAPSHOT
Oct-28 17:41:02.817 [main] INFO  nextflow.cli.CmdRun - Launching `main.nf` [shrivelled_austin] - revision: bb0fa33a13
Oct-28 17:41:02.828 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /home/patelh/nf-core/rnaseq/nextflow.config
Oct-28 17:41:02.830 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /home/patelh/nf-core/rnaseq/custom.config
Oct-28 17:41:02.831 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/patelh/nf-core/rnaseq/nextflow.config
Oct-28 17:41:02.831 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/patelh/nf-core/rnaseq/custom.config
Oct-28 17:41:02.846 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `test,singularity`
Oct-28 17:41:03.795 [main] DEBUG nextflow.plugin.PluginsFacade - Using Default plugins manager
Oct-28 17:41:03.801 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Oct-28 17:41:03.802 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Oct-28 17:41:03.804 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Oct-28 17:41:04.168 [main] DEBUG nextflow.plugin.PluginsFacade - Using Default plugins manager
Oct-28 17:41:04.480 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `test,singularity`
Oct-28 17:41:04.508 [main] DEBUG nextflow.config.ConfigBuilder - Available config profiles: [cfc_dev, ifb_core, denbi_qbic, genotoul, alice, uppmax, abims, nu_genomics, imperial_mb, oist,
mpcdf, lugh, cambridge, podman, czbiohub_aws, jax, ccga_med, test, google, computerome, seg_globe, sanger, pasteur, test_full, eddie, bi, bigpurple, docker, gis, eva, utd_ganymede, charlie
cloud, conda, singularity, icr_davros, munin, rosalind, prince, hasta, hebbe, cfc, utd_sysbio, uzh, debug, genouest, cbe, ebc, ccga_dx, crick, phoenix, biohpc_gen, shifter, awsbatch, uct_h
pc, imperial, maestro, aws_tower, binac]
Oct-28 17:41:04.534 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; plugins-dir=/home/patelh/.nextflow/plugins
Oct-28 17:41:04.536 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Oct-28 17:41:04.536 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins local root: .nextflow/plr/empty
Oct-28 17:41:04.540 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Oct-28 17:41:04.540 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Oct-28 17:41:04.543 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Oct-28 17:41:04.552 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Oct-28 17:41:04.600 [main] DEBUG nextflow.Session - Session uuid: 259db345-16a7-4b2d-8b5a-ecc1f515b61d
Oct-28 17:41:04.600 [main] DEBUG nextflow.Session - Run name: shrivelled_austin
Oct-28 17:41:04.601 [main] DEBUG nextflow.Session - Executor pool size: 16
Oct-28 17:41:04.626 [main] DEBUG nextflow.cli.CmdRun -
  Version: 21.10.0-SNAPSHOT build 5634
  Created: 15-10-2021 08:49 UTC (09:49 BST)
  System: Linux 5.10.16.3-microsoft-standard-WSL2
  Runtime: Groovy 3.0.9 on OpenJDK 64-Bit Server VM 11.0.11+9-Ubuntu-0ubuntu2.20.04
  Encoding: UTF-8 (UTF-8)
  Process: 28657@DESKTOP-MSCR14U [127.0.1.1]
  CPUs: 16 - Mem: 15.5 GB (14 GB) - Swap: 4 GB (4 GB)
Oct-28 17:41:04.651 [main] DEBUG nextflow.Session - Work-dir: /home/patelh/nf-core/rnaseq/work [ext2/ext3]
Oct-28 17:41:04.668 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Oct-28 17:41:04.682 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Oct-28 17:41:04.769 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 17; maxThreads: 1000
Oct-28 17:41:04.848 [main] DEBUG nextflow.Session - Session start invoked
Oct-28 17:41:04.852 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /home/patelh/nf-core/rnaseq/results/pipeline_info/execution_trace_2021-10-28_17-41-04.txt
Oct-28 17:41:04.859 [main] DEBUG nextflow.Session - Using default localLib path: /home/patelh/nf-core/rnaseq/lib
Oct-28 17:41:04.861 [main] DEBUG nextflow.Session - Adding to the classpath library: /home/patelh/nf-core/rnaseq/lib
Oct-28 17:41:04.862 [main] DEBUG nextflow.Session - Adding to the classpath library: /home/patelh/nf-core/rnaseq/lib/nfcore_external_java_deps.jar
Oct-28 17:41:05.458 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Oct-28 17:41:05.613 [main] ERROR nextflow.Nextflow - ERROR: Validation of pipeline parameters failed!
org.everit.json.schema.ValidationException: #: #: only 1 subschema matches out of 2
        at org.everit.json.schema.ValidatingVisitor.visitCombinedSchema(ValidatingVisitor.java:172)
        at org.everit.json.schema.CombinedSchema.accept(CombinedSchema.java:184)
        at org.everit.json.schema.Visitor.visit(Visitor.java:43)
        at org.everit.json.schema.ValidatingVisitor.visit(ValidatingVisitor.java:52)
        at org.everit.json.schema.DefaultValidator.performValidation(Validator.java:69)
        at org.everit.json.schema.Schema.validate(Schema.java:140)
        at org.everit.json.schema.Schema$validate.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at NfcoreSchema.validateParameters(NfcoreSchema.groovy:153)
        at NfcoreSchema.validateParameters(NfcoreSchema.groovy)
        at NfcoreSchema$validateParameters.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:157)
        at WorkflowMain.initialise(WorkflowMain.groovy:57)
        at WorkflowMain$initialise$0.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:157)
        at Script_20d7ff23.runScript(Script_20d7ff23:36)
        at nextflow.script.BaseScript.runDsl2(BaseScript.groovy:169)
        at nextflow.script.BaseScript.run(BaseScript.groovy:199)
        at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:221)
        at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:212)
        at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:120)
        at nextflow.cli.CmdRun.run(CmdRun.groovy:309)
        at nextflow.cli.Launcher.run(Launcher.groovy:480)
        at nextflow.cli.Launcher.main(Launcher.groovy:639)
Oct-28 17:41:05.630 [main] ERROR nextflow.Nextflow - * --hostnames: expected type: String, found: JSONObject ({"cfc":[".hpc.uni-tuebingen.de"],"utd_sysbio":["sysbio.utdallas.edu"],"utd_gan
ymede":["ganymede.utdallas.edu"],"genouest":[".genouest.org"],"cbe":[".cbe.vbc.ac.at"],"genotoul":[".genologin1.toulouse.inra.fr",".genologin2.toulouse.inra.fr"],"crick":[".thecrick.org"],
"uppmax":[".uppmax.uu.se"],"icr_davros":[".davros.compute.estate"],"imperial":[".hpc.ic.ac.uk"],"binac":[".binac.uni-tuebingen.de"],"imperial_mb":[".hpc.ic.ac.uk"]})

@drpatelh
Copy link
Member Author

Still confused why this wasn't a problem in older versions of NF. Maybe the JSON schema libraries have been updated?

@ewels
Copy link
Member

ewels commented Oct 28, 2021

I don't think it's the schema side of things - I think that something has changed in the way that Nextflow parses this particular param.

@ewels
Copy link
Member

ewels commented Oct 28, 2021

The pram in question is defined here:

https://github.com/nf-core/configs/blob/5afa43ad1534d526b4179b20adc598b1de947e63/nfcore_custom.config#L64-L83

The JSON schema library tells us that it's getting a string in the current version of Nextflow, but in the edge version it gets a JSON object.

Ok so could also be something to do with the preprocessing library that preps the params for validation I guess 🤔

It may also be possible to simply refactor how we use this parameter. It's a bit of an edge case - could probably be handled in a way that doesn't use a map.

@pditommaso
Copy link

pditommaso commented Oct 30, 2021

I've dumped the content of params_json here and made a diff between 21.04.x and 21.10.x versions. There are some interesting changes

Screenshot 2021-10-30 at 10 57 32

When using NF 21.04 the genomes params is reported as a map object that was rendered as a sole string value (which looks wrong). Instead with the latest version, the genomes is correctly handled as a nested map.

Don't know why the stringify version of genomes was working, however, it looks to me the latest version has the correct version of the genomes structure.

I've attached the json params for your convenience in this comment

params-json.zip

update: same for the hostnames linked by Phil; being a map why the validation is fine having them as a (broken) string value?

@drpatelh
Copy link
Member Author

drpatelh commented Nov 2, 2021

map object that was rendered as a sole string value (which looks wrong)

Yep, this definitely looks wrong in the current implementation where those entries are somehow loaded and validated as a string instead of a map.

Assuming that we need to add a workaround on the nf-core side to correctly handle a map we only have 1 option as far as I can see. Changing this line to "type": "object" (and everywhere else a map is required) in the parameter schema fixes it. However, this means all currently released nf-core pipelines with a JSON schema will not work with the latest release of NF so the pipelines will have to be re-released with this fix.

We could try to update and add some hacks to the validation script but I don't think that is the problem and won't really solve anything. The problem is a mismatch between the defined and expected object types...

@pditommaso
Copy link

still not sure to understand what the hostname validation should validate.

That the user specified a hostname string? in that case it looks correct that's a string, no?

@drpatelh
Copy link
Member Author

drpatelh commented Nov 2, 2021

In this case, params.hostnames is actually defined independently from the pipeline code on nf-core/configs here and it is defined as a Groovy Map.

We have basically been using it to warn users if they are not using the correct nf-core config for their institution but it may easier to retire it at this point as I have done here to fix the pressing issue with NF compatibility and either leave it out entirely or add it properly in the future.

@pditommaso
Copy link

Seems a good plan to me, think genomes may be affected with the same problem. I may have seen somewhere a similar validation rule using string instead of object

@drpatelh
Copy link
Member Author

drpatelh commented Nov 2, 2021

Luckily, we are ignoring the validation for genomes at the moment via a parameter called params.schema_ignore_params which is why the nf-core pipeline validation won't fail for this particular parameter. We can update this quietly without telling anyone in the pipeline template 🤫

@drpatelh
Copy link
Member Author

drpatelh commented Nov 2, 2021

Fixed in nf-core/configs#295

We will need to update the pipeline template accordingly to remove params.hostnames too. Will link the PR here once I have done that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants