Skip to content

PERF: enable caching on expensive offsets #17914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chris-b1 opened this issue Oct 18, 2017 · 3 comments
Closed

PERF: enable caching on expensive offsets #17914

chris-b1 opened this issue Oct 18, 2017 · 3 comments
Labels
Datetime Datetime data dtype Frequency DateOffsets Performance Memory or execution speed performance

Comments

@chris-b1
Copy link
Contributor

xref #16463

We have a set of seemingly unused but also seemingly functional caching logic for generating ranges of offsets. Example

In [63]: BM = pd.offsets.BusinessMonthEnd()

In [64]: %timeit pd.date_range('1990-01-01', '2020-01-01', freq=BM)
18.1 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [65]: BM._cacheable
Out[65]: False

In [66]: BM._cacheable = True

In [67]: %timeit pd.date_range('1990-01-01', '2020-01-01', freq=BM)
686 µs ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [68]: a = pd.date_range('1990-01-01', '2020-01-01', freq='BM')

In [69]: b = pd.date_range('1990-01-01', '2020-01-01', freq=BM)

In [70]: (a == b).all()
Out[70]: True

Possibly should enable or expose an API for this? Especially for slower offsets.

cc @jbrockmendel

@chris-b1 chris-b1 added Performance Memory or execution speed performance Datetime Datetime data dtype labels Oct 18, 2017
@jbrockmendel
Copy link
Member

The issue with caching DateOffset objects is that they are mutable. Most users will never tinker with them, but for the few who do, this could cause a lot of headaches. The medium-long term solution is to make DateOffsets immutable, then cache all of them. (Related: immutability will let us improve __hash__ 5-10x and __eq__/__ne__ by ~50x.)

There are a few more steps before we can implement immutability, but we recently got past a big roadblock with #17458. The current plan is to move the guts of tseries.offsets to _libs.tslibs.offsets and declare the relevant attributes cdef readonly. The attendant yak-shaving is moving forward with #17830, #17827, and #17746.

BTW there's also a CacheableOffset that AFAICT is unused.

I suppose in the interim enabling _cacheable for users who know what they're doing (or possibly making it the default and requiring users who want to tinker to change it?) is a viable option. Is that what you had in mind?

@jbrockmendel
Copy link
Member

@chris-b1 offsets are now immutable; want to take a swing at this?

@jbrockmendel
Copy link
Member

Closing as no-action since these are now all in cython, so should not be expensive. Can re-open if you want to revist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Frequency DateOffsets Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants