feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default #2269

phinate · 2023-08-09T11:33:42Z

Description

One of the main reasons minuit is the default optimizer is that it packages up fit results with uncertainties. We can do that for any fitting procedure using the Fisher information matrix!

This PR adds this behaviour by default for the case that the optimizer isn't minuit. The scipy.optimize.fit_result is packaged up with the uncert info as if it was calculated natively.

(still needs tests!)

Would close #2268.

Checklist Before Requesting Reviewer

Tests are passing
"WIP" removed from the title of the pull request
Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

Summarize commit messages into a comprehensive review of the PR

src/pyhf/tensor/tensorflow_backend.py

alexander-held

Right now pyhf.infer.mle.fit(..., return_uncertainties=True) silently ignores that kwarg if the backend isn't autodiff-enabled as far as I can see. I think we'd need to raise at least a warning, if not just error out in that case. Ignoring the kwarg also means that the returned shape is not as expected.

edit: it seems that the silent ignore was the default behavior previously already, which is not ideal in my opinion.

phinate · 2023-08-09T14:57:17Z

Right now pyhf.infer.mle.fit(..., return_uncertainties=True) silently ignores that kwarg if the backend isn't autodiff-enabled as far as I can see.

can you point to the specific line(s) here? i can't see that behaviour on first pass

(edit: as Alex said, this is just already in main somewhere, not part of this PR!)

matthewfeickert · 2023-08-09T22:37:55Z

Thanks @phinate! This would be a welcome addition I think. I probably won't have time to review this until later this week, but from a quick scan looks like there are still some development things to do and so I've put it in draft mode. There might be enough work here to split things out across maybe 2 PRs to try to keep things more atomic, but we can see.

This would close Issue #2268, correct? If so, can you please note that in the PR body?

phinate · 2023-08-11T08:54:12Z

Thanks @phinate! This would be a welcome addition I think. I probably won't have time to review this until later this week, but from a quick scan looks like there are still some development things to do and so I've put it in draft mode. There might be enough work here to split things out across maybe 2 PRs to try to keep things more atomic, but we can see.

This would close Issue #2268, correct? If so, can you please note that in the PR body?

Indeed, sorry I should have made this a draft! Wasn't sure if the draft would trigger the CI or not (was curious if it would pass before local testing).

I've updated the PR with that issue number, thanks for that! That gist was mainly for the errors on yields so i got a bit mixed up, but the title is clear :)

codecov · 2023-08-14T09:49:14Z

Codecov Report

Attention: Patch coverage is 96.87500% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 97.88%. Comparing base (64ab264) to head (d3142fc).

❗ Current head d3142fc differs from pull request most recent head 5d3d3d6. Consider uploading reports for the commit 5d3d3d6 to get more accurate results

Files	Patch %	Lines
src/pyhf/tensor/numpy_backend.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2269      +/-   ##
==========================================
- Coverage   98.21%   97.88%   -0.33%     
==========================================
  Files          69       69              
  Lines        4543     4593      +50     
  Branches      804      814      +10     
==========================================
+ Hits         4462     4496      +34     
- Misses         48       59      +11     
- Partials       33       38       +5

Flag	Coverage Δ
contrib	`97.84% <96.87%> (+0.04%)`	⬆️
doctest	`?`
unittests-3.10	`?`
unittests-3.11	`?`
unittests-3.12	`?`
unittests-3.8	`?`
unittests-3.9	`96.34% <96.87%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

phinate · 2023-08-14T11:01:27Z

Worked some more on this, and expanded the scope of the uncertainty and correlation tests to encompass non-minuit options.

Importantly, I've addressed an oversight, where the covariance matrix (hess_inv) was not treated correctly during fits with fixed parameters -- it's right to zero out the uncertainties, but then the covariance matrix also needs the relevant row/column zeroed out too (noticed this when calculating correlations). I don't think this was used actively downstream, but it's been fixed for minuit + scipy now.

alexander-held · 2023-08-14T11:24:38Z

Importantly, I've addressed an oversight, where the covariance matrix (hess_inv) was not treated correctly during fits with fixed parameters -- it's right to zero out the uncertainties, but then the covariance matrix also needs the relevant row/column zeroed out too (noticed this when calculating correlations). I don't think this was used actively downstream, but it's been fixed for minuit + scipy now.

I got worried that this also would probably imply a bug in cabinetry and had a look, but I cannot reproduce what you are describing, or I am misunderstanding this perhaps.

import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0]
)

data = [62, 63] + model.config.auxdata

pyhf.set_backend("numpy", "minuit")
fix_pars = model.config.suggested_fixed()
fix_pars[2] = True
res, corr, res_obj = pyhf.infer.mle.fit(
    data, model, fixed_params=fix_pars, return_correlations=True, return_result_obj=True
)

print(corr)
print(res_obj.minuit.covariance.correlation())

prints

[[ 1.         -0.26340464  0.        ]
 [-0.26340464  1.          0.        ]
 [ 0.          0.          0.        ]]
┌────────────────────┬──────────────────────────────────────────────────────────┐
│                    │                 mu uncorr_bkguncrt[0] uncorr_bkguncrt[1] │
├────────────────────┼──────────────────────────────────────────────────────────┤
│                 mu │                  1               -0.3                  0 │
│ uncorr_bkguncrt[0] │               -0.3                  1                  0 │
│ uncorr_bkguncrt[1] │                  0                  0                  0 │
└────────────────────┴──────────────────────────────────────────────────────────┘

which correctly zeros out those columns, both from the explicit return pyhf gives me and also in the iminuit data structure. Same holds for the covariance (res_obj.minuit.covariance).

phinate · 2023-08-14T11:47:28Z

which correctly zeros out those columns, both from the explicit return pyhf gives me and also in the iminuit data structure. Same holds for the covariance (res_obj.minuit.covariance).

ah okay, that's great to know -- have just located the reasons for this after looking through the minuit issue:

scikit-hep/iminuit#762 (comment)

For covariance matrix, where no such restrictions apply, the elements that correspond to fixed parameters are set to zero.

sorry for flagging this with the wrong wording! i was just confused by the fact that the uncertainties need to be manually set to zero within pyhf, but the covariance wasn't edited in-turn (but it is needed for autodiff at least, so has been useful to look at!)

phinate · 2023-08-14T13:20:03Z

I've marked this ready for review, because it now seems to do the correct thing in the cases i've tried, and the tests are passing. I've not added any new tests -- just updated the uncert-using tests in test_optim.py to also include autodiff backends (and no minuit).

This is something I'd like your thoughts on as part of double-checking this:

pyhf/tests/test_optim.py

Lines 425 to 428 in 2787b92

    
           # TODO: this does not match minuit at all (0.26418431) -- is that correct here? 
        
           assert pytest.approx([0.6693815171034548, 0.0]) == pyhf.tensorlib.tolist( 
        
               result[:, 1] 
        
           )

It seems like for the model that's being tested here, there's quite a big difference in the minuit/autodiff uncertainties (0.6 compared to 0.2). The autodiff result is consistent across backends. @alexander-held made a plot for this pre-fit:

Is it important that we keep this model in particular? It seems like we can tweak this a little to make the uncertainties agree more (e.g. by adding more data in the first bin as Alex suggested), but i'm not 100% sure of the scenarios that I'd expect minuit/autodiff uncertainties to stop agreeing. Open to your thoughts here!

alexander-held · 2023-08-14T13:41:05Z

What's the central value in this case? I'm guessing the issue might be that the lower default POI bound of 0 is close to the best-fit value and therefore the uncertainty estimates don't make too much sense. Easiest way to avoid that would be relaxing the POI bound (or changing the model / data).

phinate · 2023-08-14T15:37:22Z

What's the central value in this case? I'm guessing the issue might be that the lower default POI bound of 0 is close to the best-fit value and therefore the uncertainty estimates don't make too much sense. Easiest way to avoid that would be relaxing the POI bound (or changing the model / data).

So the test where that line is fixes the POI to 1 in the fit -- even so, seems the bounds are also (-5,5) in suggested_bounds in either case. It's just the histosys that's contributing here it would seem.

alexander-held · 2023-08-14T17:09:18Z

If it is just the histosys, I am not really sure, but given the big difference you could propagate this through to the post-fit yield uncertainties, they should be comparable to sqrt(N). Given the large difference, that should show which of the numbers is correct.

phinate · 2023-08-15T09:03:42Z

If it is just the histosys, I am not really sure, but given the big difference you could propagate this through to the post-fit yield uncertainties, they should be comparable to sqrt(N). Given the large difference, that should show which of the numbers is correct.

Doing the fixed POI fit again with mu=1 and propagating:

yields from fit: [127.56220536 184.05513402]
minuit uncert propagated to yields: [ 0.52863025 13.21575613]
autodiff uncert propagated to yields: : [ 9.0680227  13.33975763]
sqrt N: [11.29434395 13.56669208]
fitted nuisance par: both ~= --1.218

Looks like autodiff is much more like sqrt(N), with the difference being in the first bin. Still trying to work out if this kind of severe overconstraint on the NP makes sense given the tight sys band pre-fit. I think this is at least evidence that the autodiff uncerts are not malfunctioning in some way, so good news!

For the sake of scoping, should we raise this in a minuit issue to see if it's expected or not -- i.e. double check what's an artefact of the model or of the optimizer? Then we can patch the tests accordingly here in a later PR if needed, leaving them as-is for now.

(edit: slight update in numbers, accidentally used slightly modified model from original post)

alexander-held · 2023-08-15T10:24:19Z

How does MINOS compare for the parameter uncertainties?

phinate · 2023-08-16T12:06:30Z

How does MINOS compare for the parameter uncertainties?

about the same:

It's worth noting that this difference between the two methods disappears when doing MLE fits instead of fixing the POI in the fit. I think these tests in test_optim.py are mainly to check the validity of having one entry as zero, and the other as whatever it turns out to be in the fit.

…orrelation matricies for fixed parameters

…ndles this

… that references this behaviour being done

for more information, see https://pre-commit.ci

phinate · 2023-10-20T15:51:52Z

Just been and wiped the dust of this PR in the Hacktoberfest spirit! Tagging @alexander-held @matthewfeickert when you have a moment.

To recap the behaviour here:

result = pyhf.optimizer.minimize(..., return_uncertainties=True)

will return uncertainties depending on the choice of optimizer:

pyhf.optimizer = Minuit: returns uncertainties as computed by minuit.
pyhf.optimizer = anything else (e.g. scipy): returns uncertainties computed using the inverse Hessian evaluated at the MLE point, as dictated by the Cramér–Rao bound.

I have not added a keyword argument to force autodiff uncertainties if using minuit, so if that was desired, that's missing right now.

Also, the discussion above has been incorporated into the tests by just allowing minuit and autodiff uncerts to drastically disagree, with a link to this discussion added as a comment. I think we verified that it's a result of either the model or doing a fixed poi fit to eval the uncertainties (differences disappear doing a global fit).

Let me know if there's more I can do with this PR before it's good to go!

alexander-held reviewed Aug 9, 2023

View reviewed changes

src/pyhf/tensor/tensorflow_backend.py Outdated Show resolved Hide resolved

alexander-held reviewed Aug 9, 2023

View reviewed changes

matthewfeickert assigned phinate Aug 9, 2023

matthewfeickert added the feat/enhancement New feature or request label Aug 9, 2023

matthewfeickert marked this pull request as draft August 9, 2023 22:35

phinate changed the title ~~Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default~~ feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default Aug 11, 2023

phinate marked this pull request as ready for review August 14, 2023 12:25

phinate added 9 commits October 20, 2023 15:24

add autodiff machinery to jax and pytorch backends

a3abc9a

run linting

eee1b3f

fix torch by casting res to tensor

123f2a0

add tensorflow

d95c433

stitch in fixed pars before calculating uncerts

9b99a5f

patch in fisher_cov as identity for tests

e6281ee

lint

11f0a0f

add correlation calculations, and correctly zero out covariance and c…

d7f3e9d

…orrelation matricies for fixed parameters

fix docstrings

474bd1a

phinate added 3 commits October 20, 2023 15:26

adjust to ignore zeroing out corr/cov for minuit, since it already ha…

ee56ec6

…ndles this

quickly patch in skipping minuit for autodiff tests (silently passing)

5537ad9

ignore venv in gitignore

b1c9b3d

phinate force-pushed the autodiff-uncerts branch from cd48d7d to b1c9b3d Compare October 20, 2023 14:27

phinate and others added 3 commits October 20, 2023 16:05

add poi_name='mu'

3a22760

document setting things to zero, and add pointers to the minuit issue…

8f286ba

… that references this behaviour being done

[pre-commit.ci] auto fixes from pre-commit.com hooks

d3142fc

for more information, see https://pre-commit.ci

mhauru mentioned this pull request Nov 3, 2023

Recurring OSS hacksessions alan-turing-institute/hut23-open-source-sa#18

Open

phinate added 2 commits April 23, 2024 13:50

Merge branch 'main' into autodiff-uncerts

5d3d3d6

Merge branch 'main' into autodiff-uncerts

b24b591

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default #2269

feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default #2269

phinate commented Aug 9, 2023 •

edited

Loading

alexander-held left a comment •

edited

Loading

phinate commented Aug 9, 2023 •

edited

Loading

matthewfeickert commented Aug 9, 2023

phinate commented Aug 11, 2023

codecov bot commented Aug 14, 2023 •

edited

Loading

phinate commented Aug 14, 2023

alexander-held commented Aug 14, 2023

phinate commented Aug 14, 2023

phinate commented Aug 14, 2023

alexander-held commented Aug 14, 2023 •

edited

Loading

phinate commented Aug 14, 2023

alexander-held commented Aug 14, 2023

phinate commented Aug 15, 2023 •

edited

Loading

alexander-held commented Aug 15, 2023

phinate commented Aug 16, 2023

phinate commented Oct 20, 2023

feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default #2269

Are you sure you want to change the base?

feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default #2269

Conversation

phinate commented Aug 9, 2023 • edited Loading

Description

Checklist Before Requesting Reviewer

Before Merging

alexander-held left a comment • edited Loading

Choose a reason for hiding this comment

phinate commented Aug 9, 2023 • edited Loading

matthewfeickert commented Aug 9, 2023

phinate commented Aug 11, 2023

codecov bot commented Aug 14, 2023 • edited Loading

Codecov Report

phinate commented Aug 14, 2023

alexander-held commented Aug 14, 2023

phinate commented Aug 14, 2023

phinate commented Aug 14, 2023

alexander-held commented Aug 14, 2023 • edited Loading

phinate commented Aug 14, 2023

alexander-held commented Aug 14, 2023

phinate commented Aug 15, 2023 • edited Loading

alexander-held commented Aug 15, 2023

phinate commented Aug 16, 2023

phinate commented Oct 20, 2023

phinate commented Aug 9, 2023 •

edited

Loading

alexander-held left a comment •

edited

Loading

phinate commented Aug 9, 2023 •

edited

Loading

codecov bot commented Aug 14, 2023 •

edited

Loading

alexander-held commented Aug 14, 2023 •

edited

Loading

phinate commented Aug 15, 2023 •

edited

Loading