Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Check parameter shapes for pdf API calls #1461

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions src/pyhf/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,12 @@ def expected_data(self, pars, return_by_sample=False):
"""
tensorlib, _ = get_backend()
pars = tensorlib.astensor(pars)
# Verify parameter shapes
if pars.shape[-1] != self.config.npars:
raise exceptions.InvalidPdfParameters(
f"Evaluation failed as parameters have length {pars.shape[-1]} but model requires {self.config.npars}."
)

deltas, factors = self._modifications(pars)

allsum = tensorlib.concatenate(deltas + [self.nominal_rates])
Expand Down Expand Up @@ -713,6 +719,12 @@ def expected_auxdata(self, pars):
"""
tensorlib, _ = get_backend()
pars = tensorlib.astensor(pars)
# Verify parameter shapes
if pars.shape[-1] != self.config.npars:
raise exceptions.InvalidPdfParameters(
f"Evaluation failed as parameters have length {pars.shape[-1]} but model requires {self.config.npars}."
)

return self.make_pdf(pars)[1].expected_data()

def _modifications(self, pars):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we do input validation here it suggests that this has become a publicly consumable API. maybe we should add a model.modifications and do the input checks there and call _modifications if inputs are ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_modifications isn't currently a public API. This certainly could become one, although I might argue that we remove it from Model and keep it on MainModel unless there's a reason to pass-through it. e.g. pdf.main_model.modifications is just as clear to me. Unless the suggestion here is to remove the checks from main_model and constraint_model and keep all checks on model which is also possible, but feels like a mess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main_model is needed for return_by_sample, so it feels similarly "public" as Model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but main_model.expected_actualdata(..., return_by_sample) is fine -- since that's public. However main_model._modifications isn't necessarily public. Although at the moment only used by main_model.expected_data -- so I'm fine either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the suggestion here is to remove the checks from main_model and constraint_model

My comment was meant as an example for why that may not be desirable, sorry I should have made that clear. Promoting _modifications to public in the longer term is a nice idea, it looks very useful for model debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue regarding the proposal of making _modifications public: #1652.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it is now this will be called on each logpdf call.. so it's perf. critical.. should we have some kind of split of "pdf.method" and "pdf._method_unsafe"? or do we think it doesn't make a difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasheinrich @kratsg If we can revisit the performance impact here soon it would be nice to have this get into v0.7.0.

Expand All @@ -736,6 +748,12 @@ def expected_actualdata(self, pars):
"""
tensorlib, _ = get_backend()
pars = tensorlib.astensor(pars)
# Verify parameter shapes
if pars.shape[-1] != self.config.npars:
raise exceptions.InvalidPdfParameters(
f"Evaluation failed as parameters have length {pars.shape[-1]} but model requires {self.config.npars}."
)

return self.make_pdf(pars)[0].expected_data()

def expected_data(self, pars, include_auxdata=True):
Expand All @@ -751,6 +769,12 @@ def expected_data(self, pars, include_auxdata=True):
"""
tensorlib, _ = get_backend()
pars = tensorlib.astensor(pars)
# Verify parameter shapes
if pars.shape[-1] != self.config.npars:
raise exceptions.InvalidPdfParameters(
f"Evaluation failed as parameters have length {pars.shape[-1]} but model requires {self.config.npars}."
)

if not include_auxdata:
return self.make_pdf(pars)[0].expected_data()
return self.make_pdf(pars).expected_data()
Expand Down
44 changes: 44 additions & 0 deletions tests/test_pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -1070,3 +1070,47 @@ def make_model(
pyhf.tensorlib.astensor(1.05),
pyhf.tensorlib.astensor([5.0, 5.0]),
)


def test_pdf_invalid_parameter_shapes(backend):
nbins = np.random.randint(2, 10)

signal = np.random.random(nbins).tolist()
bkg = np.random.random(nbins).tolist()
bkg_uncrt = np.sqrt(bkg).tolist()
pdf = pyhf.simplemodels.uncorrelated_background(signal, bkg, bkg_uncrt)

pars = pdf.config.suggested_init()

# NB:
# - nbins is shape of actual data
# - b/c uncorrelated, we have 1 par per bin for shapesys (so nbins)
# - actual data + aux data = 2 * nbins
assert pdf.expected_actualdata(pars).shape[-1] == nbins
assert pdf.main_model.expected_data(pars).shape[-1] == nbins
assert pdf.expected_auxdata(pars).shape[-1] == nbins
assert pdf.expected_data(pars).shape[-1] == 2 * nbins

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.expected_actualdata(pars[: nbins - 1])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.main_model.expected_data(pars[: nbins - 1])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.expected_auxdata(pars[: nbins - 1])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.expected_data(pars[: nbins - 1])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.expected_actualdata(pars + [pars[-1]])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.main_model.expected_data(pars + [pars[-1]])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.expected_auxdata(pars + [pars[-1]])

with pytest.raises(pyhf.exceptions.InvalidPdfParameters):
pdf.expected_data(pars + [pars[-1]])