Option to skip validation in `Workspace.get_measurement` and `Workspace.model`? #1745

lhenkelm · 2022-01-14T11:43:29Z

Summary

Dear pyhf maintainers,
Would you be open to making validation skippable in Workspace.get_measurement and Workspace.model?

I am trying to integrate a user-defined modifier into an existing workspace.
To still be able to validate the standard parts of the spec, I wrote a slightly relaxed set of schemas that validate custom modifiers.
To switch between the standard and relaxed schemas I gave the relaxed schemas a fake version number.
However the Workspace.get_measurement validation tries to use that fake version number in its validation to look up the measurement.json schema in the pyhf package directory.
That of course raises a FileNotFoundError since there is no such schema version in pyhf, preventing me from getting a model out of my workspace.

It would be very convenient if Workspace.get_measurement would accept the validate kwarg like Model and Workspace already do on master, and if Workspace.model would forward that option to Workspace.get_measurement.

Additional Information

No response

Code of Conduct

I agree to follow the Code of Conduct

The text was updated successfully, but these errors were encountered:

kratsg · 2022-01-14T15:04:09Z

What does your python calls look like right now for specifying the schema and version? That is pyhf.Workspace(spec, schema='...', version='...') ?

lhenkelm · 2022-01-14T15:12:48Z

Thanks for the quick reaction! The calls look like this: pyhf.Workspace(spec, validate=False).

lhenkelm · 2022-01-14T15:36:57Z

PS: to be more clear, the version is only specified in the spec for the workspace, i.e. spec={'version': '1.1.0', 'channels': ..., 'measurements': ..., 'observations': ...}.
And for the schema, the implicit name is still 'workspace.json', but I am using a replicated utils.validate prior to initialising the Workspace instance, and in the replica I am looking up schemas in a completely different path in my source, where I have both a schemas/1.0.0/<bla>.json path for the usual schemas,
and a separate schemas/1.1.0/<bla>.json path for the relaxed schemas, including one for workspace.json (which gets its guts from a modified defs.json)

kratsg · 2022-01-14T15:45:48Z

What is the file that it tries to open? If you put everything in schemas/ in a separate path that you're looking up, the rest of the code should still work with that overridden load_schema functionality. So I'm confused by where the crash occurs? get_measurement uses Workspace.version to figure out the version number to load as it needs to be self-consistent

pyhf/src/pyhf/workspace.py

Lines 389 to 390 in abde607

    
           utils.validate(measurement, 'measurement.json', self.version) 
        
           return measurement

kratsg · 2022-01-14T15:47:13Z

(For what it's worth, the #1609 is technically related to this anyway)

lhenkelm · 2022-01-14T15:52:40Z

The file it tries to look up is venv/lib64/python3.8/site-packages/pyhf/schemas/1.1.0/measurement.json.
(so inside the pyhf package installation of my virtual environment)
The separate paths with relaxed schemas etc,. don't exist there, but in some version-controlled area alongside the rest of my project source.

kratsg · 2022-01-14T15:54:41Z

but I thought you modified utils.validate to look in a different path

but I am using a replicated utils.validate prior to initialising the Workspace instance, and in the replica I am looking up schemas in a completely different path in my source, where I have both a schemas/1.0.0/<bla>.json path for the usual schemas, and a separate schemas/1.1.0/<bla>.json path for the relaxed schemas, including one for workspace.json (which gets its guts from a modified defs.json)

so I'm a little confused about why this would go back to the pkg_resources variation here, if you swapped out utils.validate.

e.g.

def _myvalidate(....):
  ...

import pyhf
pyhf.utils.validate = _myvalidate

...

lhenkelm · 2022-01-14T15:58:35Z

Oh, sorry, I don't think I made that clear.
That function is also outside pyhf (not a monkey-patch or anything). It's just an adapated copy of the source from pyhf.utils.validate, where I try to look up the schema in the pyhf schemas, and if that fails, fall back to those separate files.

lhenkelm · 2022-01-14T16:01:02Z

So Workspace goes looking in pkg_resources because it is still using the standard pyhf.validate implementation.
I am just ensuring the validity of the specs by calling the home-brew validate first (by hand in the script)

lhenkelm · 2022-01-14T16:23:10Z

i.e. in effect the code does the equivalent of this:

import pyhf

def my_validate(...):
    ...
    try:
        pyhf.utils.validate(....) # ok if Model or Measurement or Workspace and version == '1.0.0'
    except FileNotFoundError as e: # either the schema is not exposed, or the version is fake
        # re-implementation of validate that refers to my custom paths
spec = make_a_spec(...)
my_validate(spec) # now I trust it
ws = pyhf.Workspace(spec, validate = False) # works
model = ws.model(validate = False) # FileNotFoundError

kratsg · 2022-01-14T16:28:20Z

Can you just try instead to do something like

import pyhf

_validate = pyhf.utils.validate
def my_validate(...):
  ...
  try:
    _validate(....)
  except:
    ...
    
spec = make_a_spec(...)
my_validate(spec) # now I trust it
ws = pyhf.Workspace(spec, validate = False) # works
model = ws.model(validate = False) # FileNotFoundError

lhenkelm · 2022-01-14T16:41:52Z

That still gives the same exception. Is there something missing in the code example? The only difference is it re-binds pyhf.utils.validate to a new name.

kratsg · 2022-01-14T16:46:19Z

It should rebind pyhf.utils.validate so that get_measurement() still calls your internal function correctly. The other option is just to do

import pyhf

pyhf.utils.validate = lambda *args, **kwargs: True

at least until the linked issue gets implemented and we don't need the sort of hacky solutions like this.

lhenkelm · 2022-01-14T16:52:16Z

oooh I see what you meant. Then its just the final re-bind of my_validate back to pyhf.utils.validate that is missing.
That does work (although it means I will have to be very careful about remembering the state of pyhf.utils.validate).
But at least I can keep going in the meantime :)

kratsg · 2022-01-14T16:59:30Z

How about for now I close this issue since technically, this will get solved when we have utils.validate supporting user-defined schemas and this is trackable via #1609?

lhenkelm · 2022-01-14T17:02:14Z

Absolutely, feel free to close this.
Implementing #1609 would be at least as good a solution in my book. (and let me write less code ^^).
Also, thank you very much for the prompt and in-depth feedback!

lhenkelm added feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign labels Jan 14, 2022

kratsg added the schema and spec JSON schema label Jan 14, 2022

kratsg self-assigned this Jan 14, 2022

kratsg closed this as completed Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to skip validation in `Workspace.get_measurement` and `Workspace.model`? #1745

Option to skip validation in `Workspace.get_measurement` and `Workspace.model`? #1745

lhenkelm commented Jan 14, 2022 •

edited

Loading

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022 •

edited

Loading

kratsg commented Jan 14, 2022 •

edited

Loading

lhenkelm commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022 •

edited

Loading

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

Option to skip validation in Workspace.get_measurement and Workspace.model? #1745

Option to skip validation in Workspace.get_measurement and Workspace.model? #1745

Comments

lhenkelm commented Jan 14, 2022 • edited Loading

Summary

Additional Information

Code of Conduct

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022 • edited Loading

kratsg commented Jan 14, 2022 • edited Loading

lhenkelm commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022 • edited Loading

lhenkelm commented Jan 14, 2022

kratsg commented Jan 14, 2022

lhenkelm commented Jan 14, 2022

Option to skip validation in `Workspace.get_measurement` and `Workspace.model`? #1745

Option to skip validation in `Workspace.get_measurement` and `Workspace.model`? #1745

lhenkelm commented Jan 14, 2022 •

edited

Loading

lhenkelm commented Jan 14, 2022 •

edited

Loading

kratsg commented Jan 14, 2022 •

edited

Loading

kratsg commented Jan 14, 2022 •

edited

Loading