Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A model is invalid according to model.json but valid according to defs.json #1360

Closed
2 of 3 tasks
lhenkelm opened this issue Mar 8, 2021 · 6 comments
Closed
2 of 3 tasks
Labels
question Further information is requested schema and spec JSON schema

Comments

@lhenkelm
Copy link
Contributor

lhenkelm commented Mar 8, 2021

Description

Writing tests, I discovered a model configuration that triggered buggy behavior of the model.json schema.
The model.json schema claims the model is invalid, yet the defs.json one accepts it as valid.

Expected Behavior

Either boths schemas should find the model invalid, or neither. So model.json should not raise a validation error (its not a realistic example, but it seems to me to comply with what is in the schemas).

Actual Behavior

The model is valid according to defs.json, but checking the validity using model.json raises:

jsonschema.exceptions.ValidationError: 'histosys' was expected

Failed validating 'const' in schema[0]['properties']['type']:
    {'const': 'histosys'}

On instance['type']:
    'normfactor'

Steps to Reproduce

package versions (PyHF is 0.5.0 from pip with xmlio and tensorflow, but it is not needed to reproduce the issue):

>>> json.__version__
'2.0.9'
>>> jsonschema.__version__
'3.2.0'
>>> requests.__version__
'2.25.1'

Example:

import json
import jsonschema
import requests

j = ('{"channels": ['
          '{"name": "test_channel_1", '
          ' "samples": ['
            '{"name": "test_sample", '
              '"data": [0.5, 3.33, 666.666], '
              '"modifiers": ['
                '{"name": "lalala", '
                  '"type": "normfactor"}]}]}, '
          '{"name": "test_channel_2", '
          '"samples": ['
            '{"name": "test_sample_2", '
            '"data": [0, 0, 1], '
            '"modifiers": []}]}], '
      '"parameters": ['
          '{"name": "p1"}, '
          '{"name": "parameter of minor interest", '
            '"factors": [0.0072992700729927005]}, '
          '{"name": "very curious parameter", '
          '"fixed": true}]}')

d = json.loads(j)

jsonschema.validate(
    instance = d,
    schema = requests.get(
        'https://scikit-hep.org/pyhf/schemas/1.0.0/defs.json').json()) # works
jsonschema.validate(
    instance = d,
    schema = requests.get(
        'https://scikit-hep.org/pyhf/schemas/1.0.0/model.json').json()) # raises

Checklist

  • Run git fetch to get the most up to date version of master
  • Searched through existing Issues to confirm this is not a duplicate issue
  • Filled out the Description, Expected Behavior, Actual Behavior, and Steps to Reproduce sections above or have edited/removed them in a way that fully describes the issue
@matthewfeickert matthewfeickert added schema and spec JSON schema question Further information is requested labels Mar 8, 2021
@matthewfeickert
Copy link
Member

Thanks for the Issue @lhenkelm.

Either boths schemas should find the model invalid, or neither. So model.json should not raise a validation error

This is the crux of you issue, as this assumption isn't correct. defs.json has no validation and so will validate anything you give it. You are currently giving a broken spec, and so model.json rightly fails to validate it.

@kratsg
Copy link
Contributor

kratsg commented Mar 8, 2021

@lhenkelm there's a couple of problems at play here. Fundamentally, the issue is that your jsonschema implementation isn't quite correct as it will not resolve references for you on the fly. See https://python-jsonschema.readthedocs.io/en/stable/references/ . You can see python-jsonschema/jsonschema#274 for more details -- and in particular, my gist which shows how you can write up a ref resolver to correctly resolve this for you.

Of course, this is pretty annoying to do (as I realized when I was fleshing out the JSON schema) and support for ref resolver has still waned a bit even to this day... This is why we provide a (undocumented, sorry!) validation utility that wraps jsonschema for you and handles all of this annoying boilerplate code.

Here's your example that correctly gives the right error for your spec:

$ cat issue1360.py 
import json
import pyhf

j = """{
  "channels": [
    {
      "name": "test_channel_1",
      "samples": [
        {
          "name": "test_sample",
          "data": [
            0.5,
            3.33,
            666.666
          ],
          "modifiers": [
            {
              "name": "lalala",
              "type": "normfactor"
            }
          ]
        }
      ]
    },
    {
      "name": "test_channel_2",
      "samples": [
        {
          "name": "test_sample_2",
          "data": [
            0,
            0,
            1
          ],
          "modifiers": []
        }
      ]
    }
  ],
  "parameters": [
    {
      "name": "p1"
    },
    {
      "name": "parameter of minor interest",
      "factors": [
        0.0072992700729927005
      ]
    },
    {
      "name": "very curious parameter",
      "fixed": true
    }
  ]
}"""

d = json.loads(j)

pyhf.utils.validate(d, 'defs.json')
pyhf.utils.validate(d, 'model.json')

which complains with

$ python issue1360.py 
Traceback (most recent call last):
  File "/Users/kratsg/.pyenv/versions/pyhf-dev/lib/python3.8/site-packages/pyhf/utils.py", line 49, in validate
    return validator.validate(spec)
  File "/Users/kratsg/.pyenv/versions/pyhf-dev/lib/python3.8/site-packages/jsonschema/validators.py", line 353, in validate
    raise error
jsonschema.exceptions.ValidationError: {'name': 'lalala', 'type': 'normfactor'} is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['channels']['items']['properties']['samples']['items']['properties']['modifiers']['items']:
    {'anyOf': [{'$ref': '#/definitions/modifier/histosys'},
               {'$ref': '#/definitions/modifier/lumi'},
               {'$ref': '#/definitions/modifier/normfactor'},
               {'$ref': '#/definitions/modifier/normsys'},
               {'$ref': '#/definitions/modifier/shapefactor'},
               {'$ref': '#/definitions/modifier/shapesys'},
               {'$ref': '#/definitions/modifier/staterror'}]}

On instance['channels'][0]['samples'][0]['modifiers'][0]:
    {'name': 'lalala', 'type': 'normfactor'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "issue1360.py", line 60, in <module>
    pyhf.utils.validate(d, 'model.json')
  File "/Users/kratsg/.pyenv/versions/pyhf-dev/lib/python3.8/site-packages/pyhf/utils.py", line 51, in validate
    raise InvalidSpecification(err, schema_name)
pyhf.exceptions.InvalidSpecification: {'name': 'lalala', 'type': 'normfactor'} is not valid under any of the given schemas.
	Path: channels[0].samples[0].modifiers[0]
	Instance: {'name': 'lalala', 'type': 'normfactor'} Schema: model.json

and this is because you neglected to put data: null in for the normfactor. We require that all modifiers have a data parameter, even if they're not used. The reason is to keep the structure as highly consistent as possible.

In your case, you will want

{
    "name": "lalala",
    "type": "normfactor",
    "data": null
}

@matthewfeickert
Copy link
Member

@lhenkelm if you have further questions here we're happy to address them. If not I'll close this Issue at the end of the day if you haven't already. 👍

Thanks for using pyhf too! 🚀

@lhenkelm
Copy link
Contributor Author

lhenkelm commented Mar 8, 2021

Hi @matthewfeickert, @kratsg, thanks for the quick response!
The validation util seems easy enough to use, so I will give it a go.

This is less a pyhf question, and more about jsonschema, but what does it mean for defs.json to "not have validation"?
Is it because the definitions are all wrapped inside the outer "definitions":{...} mapping?
Also, is there any way to validate against component definitions (like "modifier" "config" etc.) without embedding them in either a model, a workspace, or a measurement?

@kratsg
Copy link
Contributor

kratsg commented Mar 8, 2021

what does it mean for defs.json to "not have validation"?
Is it because the definitions are all wrapped inside the outer "definitions":{...} mapping?

Yes. That's what it means. Look at the documentation here (https://json-schema.org/understanding-json-schema/structuring.html) for details on re-use. The main reason for this structure is to allow extending, but also make it easy to build smaller pieces.

Also, is there any way to validate against component definitions (like "modifier" "config" etc.) without embedding them in either a model, a workspace, or a measurement?

You can make your own custom schema based on ours and use pyhf.utils.validate. Something like

/path/to/modifiers.json

{
  "$schema": "http://json-schema.org/draft-06/schema#",
  "$id": "modifier.json",
   "type":"array",
   "items":{
      "anyOf":[
         {
            "$ref":"defs.json#/definitions/modifier/histosys"
         },
         {
            "$ref":"defs.json#/definitions/modifier/lumi"
         },
         {
            "$ref":"defs.json#/definitions/modifier/normfactor"
         },
         {
            "$ref":"defs.json#/definitions/modifier/normsys"
         },
         {
            "$ref":"defs.json#/definitions/modifier/shapefactor"
         },
         {
            "$ref":"defs.json#/definitions/modifier/shapesys"
         },
         {
            "$ref":"defs.json#/definitions/modifier/staterror"
         }
      ]
   }
}

or similar. defs.json is composable / modular.

@lhenkelm
Copy link
Contributor Author

lhenkelm commented Mar 8, 2021

Thanks a lot for the fast and comprehensive answers! I'll go and write some schemas then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested schema and spec JSON schema
Projects
None yet
Development

No branches or pull requests

3 participants