-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
data frame rolling std return wrong result with large elements #28244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems like the issue is in Not sure how you'd debug this, but it looks like the function behaves as though all the remaining elements are zero: import numpy as np
import pandas as pd
from pandas._libs.window import roll_var
lst = [10**10, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame(lst)
df.rolling(5, 3).var()
# 0
# 0 NaN
# 1 NaN
# 2 3.333333e+19
# 3 2.500000e+19
# 4 2.000000e+19
# 5 0.000000e+00
# 6 0.000000e+00
# 7 0.000000e+00
# 8 0.000000e+00
roll_var(np.asarray(lst, dtype=np.float64), 5, 3, None, None, 1)
# array([ nan, nan, 3.33333333e+19, 2.50000000e+19,
# 2.00000000e+19, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
# 0.00000000e+00])
lst = [10**10, 0, 0, 0, 0, 0, 0, 0, 0]
df = pd.DataFrame(lst)
df.rolling(5, 3).var()
# 0
# 0 NaN
# 1 NaN
# 2 3.333333e+19
# 3 2.500000e+19
# 4 2.000000e+19
# 5 0.000000e+00
# 6 0.000000e+00
# 7 0.000000e+00
# 8 0.000000e+00 |
Seems like the issue is indeed in rolling_var, I change to |
I looked into this a bit. pandas/pandas/_libs/window/aggregations.pyx Line 440 in 0d4a1c1
The first value here equals delta during the first iteration. Delta is equally large through the next iterations. The power of two of delta is so large, that we run into floating point arithmetical issues. The small values coming later are lost here. I can't see a way to fix this. Should we add something to the docs? |
Code Sample, a copy-pastable example if possible
The text was updated successfully, but these errors were encountered: