Skip to content

BUG: numerical inconsistency in calculating rolling std, when the same data from different begining #60053

Closed
@tunkill

Description

@tunkill

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
test = pd.read_csv('std_problem.csv', index_col=0, parse_dates=True)

print(test.rolling(1000).std().iloc[-1])
data    0.0
Name: 2018-01-03 08:45:00, dtype: float64
print(test.iloc[-35785:].rolling(1000).std().iloc[-1])
data    0.0
Name: 2018-01-03 08:45:00, dtype: float64
print(test.iloc[-35784:].rolling(1000).std().iloc[-1])
data    1.230596
print(test.iloc[-35781:].rolling(1000).std().iloc[-1])
data    0.959358
Name: 2018-01-03 08:45:00, dtype: float64
Name: 2018-01-03 08:45:00, dtype: float64
print(test.iloc[-1000:].rolling(1000).std().iloc[-1])
data    0.701844
Name: 2018-01-03 08:45:00, dtype: float64
print(np.std(test.iloc[-1000:], ddof=1))
data    0.701844
dtype: float64

Issue Description

I have a data Series,which has a length of 93230,I want to calculate rolling std,but I got 0 for last one, that’s alomost impossible, so I check the result for only the last 1000 std, it’s the same with numpy.std, I found maybe from a special beging, the rolling std will give different results!

Expected Behavior

I expect they give the same result: 0.701844, no matter what the begining is, because the rolling 1000 should only use the latest 1000 numbers for the last std
std_problem.csv

Installed Versions

I test with pandas = 2.1.2 and pandas = 2.2.3, both have the same problem

pd.show_versions()
INSTALLED VERSIONS

commit : a60ad39
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 6.8.0-45-generic
Version : #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
setuptools : 69.5.1
pip : 24.0
lxml.etree : 5.2.2
jinja2 : 3.1.4
IPython : 8.20.0
pandas_datareader : 0.10.0
bs4 : 4.12.3
bottleneck : 1.3.7
fsspec : 2024.6.1
matplotlib : 3.8.4
numba : 0.59.1
numexpr : 2.10.1
pyarrow : 16.1.0
s3fs : 2024.6.1
scipy : 1.12.0
sqlalchemy : 2.0.31
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.6.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1

INSTALLED VERSIONS

commit : 0691c5c
python : 3.11.10
python-bits : 64
OS : Linux
OS-release : 6.8.0-45-generic
Version : #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.3
numpy : 2.1.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.2
IPython : 8.28.0
tzdata : 2024.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions