Skip to content

Calculate, store correlated uncertainties for variable parameters for easier downstream calcs #888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 5, 2023

Conversation

newville
Copy link
Member

@newville newville commented May 17, 2023

Description

This adds a uvars attribute to ModelResult that contains the correlated uncertainties for the fitting variables using the uncertainties package. That allows post-fit calculations using fit results to properly propagate uncertainties, including taking correlations into account.

This is a work-in-progress, docs and tests needed, maybe a bit of discussion. See #887 for a bit of rationale.

Type of Changes
  • Bug fix
  • New feature
  • Refactoring / maintenance
  • Documentation / examples
Tested on

Python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]

lmfit: 1.2.1.post3+g502cc9b.d20230517, scipy: 1.10.1, numpy: 1.24.3, asteval: 0.9.29, uncertainties: 3.1.7

Verification

Have you

  • included docstrings that follow PEP 257?
  • referenced existing Issue and/or provided relevant link to mailing list?
  • verified that existing tests pass locally?
  • verified that the documentation builds locally?
  • squashed/minimized your commits and written descriptive commit messages?
  • added or updated existing tests to cover the changes?
  • updated the documentation and/or added an entry to the release notes (doc/whatsnew.rst)?
  • added an example?

Many of these should be done before merging, but are not done yet.

@codecov
Copy link

codecov bot commented May 17, 2023

Codecov Report

Merging #888 (54b9af1) into master (115113c) will decrease coverage by 0.07%.
The diff coverage is 91.80%.

@@            Coverage Diff             @@
##           master     #888      +/-   ##
==========================================
- Coverage   93.32%   93.25%   -0.07%     
==========================================
  Files          10       10              
  Lines        3653     3679      +26     
==========================================
+ Hits         3409     3431      +22     
- Misses        244      248       +4     
Impacted Files Coverage Δ
lmfit/models.py 91.38% <ø> (ø)
lmfit/jsonutils.py 91.73% <60.00%> (-1.37%) ⬇️
lmfit/parameter.py 98.47% <93.18%> (-0.57%) ⬇️
lmfit/minimizer.py 90.04% <100.00%> (-0.23%) ⬇️
lmfit/model.py 91.20% <100.00%> (+0.09%) ⬆️

@newville
Copy link
Member Author

This has now morphed a bit, but for the better, I think.

All of the code to propagate uncertainties for parameters has been moved out of minimizer.py (which no longer even imports the uncertainties module), and into parameter.py, where Parameters now has a create_uvars method which both propagates correlated uncertainties to constraint expressions, and also creates a uvars dict of correlated uncertainties.ufloats. This dict then puts into MinimizerResult and ModelResult, and the user can do any calculation with the Parameters that will properly propagate uncertainties.

An example of this is given.

As hinted at above, one way this might be used is to not have "derived parameters" (like fwhm, height) specified as parameter hints when the model makes parameters. Doing it as we currently do means that these parameters are evaluated for each function call, even though they are not used - a mild performance hit. The advantage is that our current tests sort of assume that we are doing it this way.

It would be possible to have these or other derived parameters defined in a Model.post_fit() function which means they will be evaluated once after the fit is complete. The added example includes doing this too.

While we could change models.py to use this approach it would break many tests. For now, I've added commented-out proof-of-concept code for several models in models.py. I do not have a strong opinion here, but I think it is useful to show how this might be used for other derived parameters (say total area under two peaks, ignoring a background or centroid of two peaks), which the added example does.

Docs and tests are still needed.

@Tillsten
Copy link
Contributor

Funny, I was going suggest exactly the same thing for the derived parameters when I saw the PR.

@reneeotten
Copy link
Contributor

@newville I was out of the country for a bit and catching up with work right now. But I'll take a look at this in the -hopefully- near future and comment here.

@newville
Copy link
Member Author

newville commented Jun 9, 2023

@reneeotten Thanks, and no worries -- I got part way through this and sort of stalled, focusing on other things too. I think the "add basic uvars" could be called nearly complete, though doc/examples could be improved. I hope to get back to that next week. There is no rush!

The idea of Model.post_fit() - where one might add parameters after the fit only for the ability to evaluate more complex results - might really want more discussion. That is, we could move the evaluations of "fwhm" and "height" for many models to be "after the fit is completed". We don't use these during the fit, and they are a small performance hit (I seem to recall an actual complaint about that at one point).

I think it would be worth adding Model.post_fit() for downstream users. Whether we use that with the built-in models might be a separate question. I might lean Yes to that just to "use our own code". But it is sort of a big conceptual change, and small changes to lots of code.

It would be perfectly fine with me to say that "adding uvars" is a complete PR unit, and that Model.post_fit() could be a separate topic/PR.

@newville
Copy link
Member Author

@reneeotten I added tests, so I think this is ready to discuss / review. No rush, no real commitment to everything here.

@newville
Copy link
Member Author

I would like to merge this and then work on other issues, including working on the whole business of "array-like inputs for Model" and the coercion of data to float64, and cleaning up many of the tests. It would be helpful to have reviews or comments on PRs, but I think we do have to be able to make progress and not have issues or PRs go too long without resolution.

@@ -12,8 +12,7 @@

@pytest.mark.parametrize(
"value, expected_result",
[(1, 1.0), (-1, -1.0), (0, tiny), (-0, -tiny), (np.array([1]), 1.0)],
)
[(1, 1.0), (-1, -1.0), (0, tiny), (-0, -tiny)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't fix a deprecation warning in the not_zero method, it merely removes the test that triggers this warning. So unless now the test is suddenly not necessary anymore because we can never get into this situation, it doesn't really seem to fix the underlying problem - correct?

I've looked a bit at the release notes of the recent SciPy and NumPy versions and I can probably spend some time to get some of this sorted out. But as I said in previous PRs as well, just removing tests because they don't pass anymore or produce warnings doesn't look like the best approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't view this as removing a test because it fails but removing a test that tests something we don't do. That is, none of the 3 uses of not_zero() pass in an array. We're not checking the data type of the value passed to not_zero() within the function: a string, dict, tuple, list, would also fail. It turns out that, weirdly, arrays with length=1 used to work.

There are probably 25 or more places in lineshapes.py alone where non_zero() could be used but is currently using val = max(tiny, val) instead: None of those would work with anything but floats (or ints, I suppose) either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, I see that the dho lineshape code is now indeed such that non_zero is not used on arrays anymore. I think initially it (or some variant thereof) was used and that is why the test was there. Anyway, if the conclusion is that the test is not needed anymore, that's okay with me - at the very least then the commit message is rather misleading ;)

If you consider this PR finalized I am happy to take another look at if if you think that's helpful. However, if you're not interested in opinions and/or suggestions and just want to keep merging PRs that's also fine, but then I'll not waste any time on it. Just let me know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, yeah sometimes single-line commit comments are not enough, and good commit comments are hard enough anyway...

Yes, I consider this PR ready to merge.

@newville
Copy link
Member Author

newville commented Jul 5, 2023

@reneeotten OK, I'm going to merge this, and then try to resolve #900 and #875. I would like to see if #883 can get resolved too.

@newville newville merged commit 875c2f7 into master Jul 5, 2023
@newville newville deleted the calc_upars branch July 6, 2023 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants