Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to skip validation in Workspace.get_measurement and Workspace.model? #1745

Closed
1 task done
lhenkelm opened this issue Jan 14, 2022 · 16 comments
Closed
1 task done
Assignees
Labels
feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign schema and spec JSON schema

Comments

@lhenkelm
Copy link
Contributor

lhenkelm commented Jan 14, 2022

Summary

Dear pyhf maintainers,
Would you be open to making validation skippable in Workspace.get_measurement and Workspace.model?

I am trying to integrate a user-defined modifier into an existing workspace.
To still be able to validate the standard parts of the spec, I wrote a slightly relaxed set of schemas that validate custom modifiers.
To switch between the standard and relaxed schemas I gave the relaxed schemas a fake version number.
However the Workspace.get_measurement validation tries to use that fake version number in its validation to look up the measurement.json schema in the pyhf package directory.
That of course raises a FileNotFoundError since there is no such schema version in pyhf, preventing me from getting a model out of my workspace.

It would be very convenient if Workspace.get_measurement would accept the validate kwarg like Model and Workspace already do on master, and if Workspace.model would forward that option to Workspace.get_measurement.

Additional Information

No response

Code of Conduct

  • I agree to follow the Code of Conduct
@lhenkelm lhenkelm added feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign labels Jan 14, 2022
@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

What does your python calls look like right now for specifying the schema and version? That is pyhf.Workspace(spec, schema='...', version='...') ?

@lhenkelm
Copy link
Contributor Author

Thanks for the quick reaction! The calls look like this: pyhf.Workspace(spec, validate=False).

@lhenkelm
Copy link
Contributor Author

PS: to be more clear, the version is only specified in the spec for the workspace, i.e. spec={'version': '1.1.0', 'channels': ..., 'measurements': ..., 'observations': ...}.
And for the schema, the implicit name is still 'workspace.json', but I am using a replicated utils.validate prior to initialising the Workspace instance, and in the replica I am looking up schemas in a completely different path in my source, where I have both a schemas/1.0.0/<bla>.json path for the usual schemas,
and a separate schemas/1.1.0/<bla>.json path for the relaxed schemas, including one for workspace.json (which gets its guts from a modified defs.json)

@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

What is the file that it tries to open? If you put everything in schemas/ in a separate path that you're looking up, the rest of the code should still work with that overridden load_schema functionality. So I'm confused by where the crash occurs? get_measurement uses Workspace.version to figure out the version number to load as it needs to be self-consistent

pyhf/src/pyhf/workspace.py

Lines 389 to 390 in abde607

utils.validate(measurement, 'measurement.json', self.version)
return measurement

@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

(For what it's worth, the #1609 is technically related to this anyway)

@kratsg kratsg added the schema and spec JSON schema label Jan 14, 2022
@kratsg kratsg self-assigned this Jan 14, 2022
@lhenkelm
Copy link
Contributor Author

lhenkelm commented Jan 14, 2022

The file it tries to look up is venv/lib64/python3.8/site-packages/pyhf/schemas/1.1.0/measurement.json.
(so inside the pyhf package installation of my virtual environment)
The separate paths with relaxed schemas etc,. don't exist there, but in some version-controlled area alongside the rest of my project source.

@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

but I thought you modified utils.validate to look in a different path

but I am using a replicated utils.validate prior to initialising the Workspace instance, and in the replica I am looking up schemas in a completely different path in my source, where I have both a schemas/1.0.0/<bla>.json path for the usual schemas, and a separate schemas/1.1.0/<bla>.json path for the relaxed schemas, including one for workspace.json (which gets its guts from a modified defs.json)

so I'm a little confused about why this would go back to the pkg_resources variation here, if you swapped out utils.validate.

e.g.

def _myvalidate(....):
  ...

import pyhf
pyhf.utils.validate = _myvalidate

...

@lhenkelm
Copy link
Contributor Author

Oh, sorry, I don't think I made that clear.
That function is also outside pyhf (not a monkey-patch or anything). It's just an adapated copy of the source from pyhf.utils.validate, where I try to look up the schema in the pyhf schemas, and if that fails, fall back to those separate files.

@lhenkelm
Copy link
Contributor Author

So Workspace goes looking in pkg_resources because it is still using the standard pyhf.validate implementation.
I am just ensuring the validity of the specs by calling the home-brew validate first (by hand in the script)

@lhenkelm
Copy link
Contributor Author

i.e. in effect the code does the equivalent of this:

import pyhf

def my_validate(...):
    ...
    try:
        pyhf.utils.validate(....) # ok if Model or Measurement or Workspace and version == '1.0.0'
    except FileNotFoundError as e: # either the schema is not exposed, or the version is fake
        # re-implementation of validate that refers to my custom paths
spec = make_a_spec(...)
my_validate(spec) # now I trust it
ws = pyhf.Workspace(spec, validate = False) # works
model = ws.model(validate = False) # FileNotFoundError

@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

Can you just try instead to do something like

import pyhf

_validate = pyhf.utils.validate
def my_validate(...):
  ...
  try:
    _validate(....)
  except:
    ...
    
spec = make_a_spec(...)
my_validate(spec) # now I trust it
ws = pyhf.Workspace(spec, validate = False) # works
model = ws.model(validate = False) # FileNotFoundError

@lhenkelm
Copy link
Contributor Author

That still gives the same exception. Is there something missing in the code example? The only difference is it re-binds pyhf.utils.validate to a new name.

@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

It should rebind pyhf.utils.validate so that get_measurement() still calls your internal function correctly. The other option is just to do

import pyhf

pyhf.utils.validate = lambda *args, **kwargs: True

at least until the linked issue gets implemented and we don't need the sort of hacky solutions like this.

@lhenkelm
Copy link
Contributor Author

oooh I see what you meant. Then its just the final re-bind of my_validate back to pyhf.utils.validate that is missing.
That does work (although it means I will have to be very careful about remembering the state of pyhf.utils.validate).
But at least I can keep going in the meantime :)

@kratsg
Copy link
Contributor

kratsg commented Jan 14, 2022

How about for now I close this issue since technically, this will get solved when we have utils.validate supporting user-defined schemas and this is trackable via #1609?

@lhenkelm
Copy link
Contributor Author

Absolutely, feel free to close this.
Implementing #1609 would be at least as good a solution in my book. (and let me write less code ^^).
Also, thank you very much for the prompt and in-depth feedback!

@kratsg kratsg closed this as completed Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign schema and spec JSON schema
Projects
None yet
Development

No branches or pull requests

2 participants