-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[BUG]: Implement Kahan summation for rolling().mean() to avoid numerical issues #36348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/core/arrays/datetimelike.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is shared code with mean; so pls test that as well
Done |
@mroeschke
|
sorry I meant rolling.sum yeah ideally we could share code, though these are simple enough and we don't want to introduce functions that it may be just easier to duplicate what is needed. ok on this PR or followon to do rolling.sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this PR or more likely kahan sum, I think there are a couple of issues on the tracker. if you'd have a look can link them all (and use examples from them).
pandas/_libs/window/aggregations.pyx
Outdated
|
||
# Not NaN | ||
if notnan(val): | ||
nobs[0] = nobs[0] + 1 | ||
sum_x[0] = sum_x[0] + val | ||
y = val - c[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that using kahan summation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already added a comment in the function header. Is this sufficient?
@jreback Implemented Kahan summation for Functions for add_mean, remove_mean, add_sum and remove_sum are quite similar now. We could share this code part for these functions. |
Will look into Test failures |
Tests should be fixed now. Somehow mixed plus and minus up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment otherwise LGTM
pandas/tests/window/test_rolling.py
Outdated
result = ( | ||
df.resample("1s").ffill().rolling("3s", closed="left", min_periods=3).mean() | ||
) | ||
assert result.values[-1] == 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we compare the entire DataFrame result for this test as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks good, ex @mroeschke comments.
can you run all of the rolling asv's and report the results. I would expect a small slowdown.
I ran asv for rolling. Is the output within the range you expected?
|
thanks @phofl very nice! yeah this is a sligth perf hit, but worth the tradeoff |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I implemented the Kahan summation as suggested by @jreback. I used the variable names from https://en.wikipedia.org/wiki/Kahan_summation_algorithm. If there exists a name convention am not aware of, I am happy to rename the variables.