Skip to content

PERF: dataframe.resample is very slowly in ver 1.4 and 1.4.1 #46066

Closed
@dovsay

Description

@dovsay

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

df_con = df_con.reindex(columns = ['datetime','open','high','low','close','volume','amount'])
##    重新采样——5 to 15

df_con.drop(columns=['amount'],inplace=True)
df_con.set_index('datetime',inplace=True)
ohlc_dict = {                                                                                                             
        'open':'first',                                                                                                    
        'high':'max',                                                                                                       
        'low':'min',                                                                                                        
        'close': 'last',                                                                                                    
        'volume': 'sum'
        }
o =time.time()
df_con = df_con.resample('15min',closed='right', label='right').apply(ohlc_dict)
print(time.time() - o)

#----------------------

df_con the dataframe read from the csv file includes 14690 rows

codes print 8.608731031417847 (about 9 seconds on average)

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.9.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : zh_CN
LOCALE : Chinese (Simplified)_China.936

pandas : 1.4.1
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 22.0.3
setuptools : 60.9.1
Cython : None
pytest : None
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.31.1
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
None

Prior Performance

same code as above
It prints 0.015625476837158203(about 0.015 seconds on average),when i change the version to 1.3.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas versionReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions