-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion of input to float64 #850
Comments
@parejkoj Yes, I agree that there is a problem here and that we should revisit the changes made related to #727 -- I'm totally willing to own all the blame. I think the right approach would be to place no expectations or do no coercion of input independent variables -- datatypes, data shapes, whether they are actually Pandas series, xarrays, regular dicts, hdf5 groups, user-defined objects, URLs, whatever. The user should get the independent variables in their objective/model function in whatever form they sent them. It is even reasonable to assert that they are identical (by reference, not just value). But, we should assert and check that the residual array is really a numpy array of dtype='float64'. I don't think that would be a hard change to make, but I don't have time to work on this at the moment. I think this issue should be fixed before the 1.2.0 release, but probably that we should merge #844 first. A test case that currently fails would be particularly helpful. As an aside, #844 will change the default value of |
it seems to me that this is being addressed in PR #844 where epsfcn is changed. I see personally no reason to revert or change any of the coercing to float64 - in the end that is what we want. If an end-user finds that a fit doesn't converge then sure he/she can change parameters of the optimizer, similar how one would have needed to do this when using SciPy directly. Or am I missing something here? |
@reneeotten Yeah, I think what is missing is the reasonable case in which independent parameters are large 32 (or 16-bit) arrays or images, and the user wants to do some calculations with those, probably dramatically reducing the data size (taking projections, etc) to arrive at an array of many fewer elements that they really use for the fit. We are currently forcing those input independent data to be float64. That up-conversion might saturate memory, or lead to a much slower calculation. We should coerce the output to float64, but I think that should be as late as possible, and that we should undo the coercion of the independent data. FWIW, I think the "coerce the output to float64" could be the shortest possible PR (add 1 char!), changing Line 600 in d2ba8e9
to return _nan_policy(np.asfarray(out).ravel(),` A consequence of not-coercing the input is that the user might run into problems (as we've seen with pandas Series) with converting and using with ndarrays or with actually trying to do fits with float32 with might be not-so-stable. But if a user wants to have Series or uint16 images as independent data, at least all the problems would actually be in their own code, so at least they can own the problem. Yes, our forcing such coercion can help them from themselves, but it can also lead to other hard-to-undo problems. Increasing We could fold all that into #844, but it could be separate too. I'm hoping to submit a different, orthogonal PR related to #848 tonight. |
- allow for better handling of data/models that are less-precise than float64 See: #850
@parejkoj I think this is addressed in 1.2.0, so closing. |
First Time Issue Code
Yes, I read the instructions and I am sure this is a GitHub Issue.
Description
This is a followup to this discussion on the mailing list, where it was recommended to start an issue for further discussion:
https://groups.google.com/g/lmfit-py/c/nrecDPD8VT0/m/I6P990EgAwAJ
On this commit, @reneeotten forced coercion of input data, including independent variables, and data to be fit, to numpy float64 arrays. In scipy.leastsq,
sqrt(epsfcn)
is multiplied by the input data to determine the step size. The default,None
, results in a step size relative to the precision of the data. If a custom model is used with internal precision less than this, it may result in the fitter not being able to converge, because the initial steps are smaller than the model resolution.In our case, with the internal model and data both being float32, setting
'epsfcn'~>1e-13
results in the fits succeeding. If lmfit is going to coerce inputs to float64, it may be worth also forcing epsfcn to a larger value than the default, possibly conditional on whether the data had to be upcast to float. On the other hand, it may be better to only coerce in some cases, or only coerce the output, not the input. Finally, making a note in the docs that the input data is coerced to float64 and that epsfcn might need to be increased for models with less internal precision might be enough. There are a variety of tradeoffs here (see mailing list discussion).A couple of related past issues: #727 and #723 .
A Minimal, Complete, and Verifiable example
I don't have an mcve for this at present, but can try to come up with one if a more specific example is required.
Error message:
This doesn't necessarily result in an error message, just a lack of the fit converging. In our case, lmfit>=1.0.3 failed to converge with our model until we set
epsfcn=1e-10
.Version information
Python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]
lmfit: 0.0.post2673+g729491d, scipy: 1.10.0, numpy: 1.21.6,asteval: 0.9.28, uncertainties: 3.1.7
Link(s)
https://groups.google.com/g/lmfit-py/c/nrecDPD8VT0/m/I6P990EgAwAJ
The text was updated successfully, but these errors were encountered: