-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
QST: Roadmap for deprecations of Period types #56588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think there is a clear roadmap for deprecating Period entirely; not much interest in #54235. We did recently deprecate support for PeriodIndex in resample. @ChadFulton can you elaborate a bit on how the deprecation of Period[B] is a pain point? |
Thanks @jbrockmendel. Currently I use So one practical example of how I use the information in ix = pd.period_range(start='2000-01-01', end='2000-01-31', freq='B')
dta = pd.Series(np.arange(len(ix)), index=ix)
center = ix.start_time + (ix.end_time - ix.start_time) / 2
plt.plot(center, dta) Overall, my hope is that we don't lose Periods, because I think that they represent a distinct and useful concept that is not captured by non-ns units for datetime. But I understand that it's a maintenance burden. So more specifically here, my general feeling was just that because |
Thanks for fleshing this out. The plotting example is useful (in fact ATM the dt64 plotting code currently converts to period internally, which we need to change before the Period[B] deprecation can be enforced. That is a non-trivial re-write that isn't likely to happen without funding, so may not happen for 3.0).
"B" is an outlier in the Period code; getting rid of it will allow non-trivial (though not-huge) code simplifications. More importantly from my perspective, getting rid of it is a blocker to moving Period from using The immediate question is whether to revert that deprecation. Plotting is one use case that is inconvenienced here. Are there others? (The rest here is not super-relevant, just things that came to mind while reading your comment)
This wording reminds me of years-old disagreement I used to have with jreback. My position was that for units where we have both,
With the introduction of non-nano, there is now a question of what datetime64/Timestamp unit to return for Period properties. Keeping it as nanos is fine in most cases, but ATM that will raise for very-large Periods, which can be alleviated by returning a lower-resolution unit. This is a rare enough corner case that I haven't been interested in bothering with it. |
Thanks again for your thoughts on this. Following up with a second specific use case that may help the conversation:
Another use case that shows up as a problem is that This leads to several secondary problems related to calling index = pd.date_range('2000', periods=10, freq='B', name='B_index')
x = pd.DataFrame(0, columns=['a', 'b'], index=index)
print(x.index.freq)
y = x.reset_index().set_index('B_index')
print(y.index.freq) yields <BusinessDay>
None while when using Periods: index = pd.period_range('2000', periods=10, freq='B', name='B_index')
x = pd.DataFrame(0, columns=['a', 'b'], index=index)
print(x.index.freq)
y = x.reset_index().set_index('B_index')
print(y.index.freq) yields <BusinessDay>
<BusinessDay> Of course this same behavior would happen with any frequency, not just |
Also, related to the eventual For example: import pandas as pd
ix = pd.date_range(start='2000', periods=10, freq='YE')
ix.to_period()
# ^ works and gives Y-DEC
pd.period_range(start='2000', periods=10, freq=ix.freq)
# ^ works and gives Y-DEC
pd.period_range(start='2000', periods=10, freq=ix.freqstr)
# ^ raises ValueError: Invalid frequency: YE-DEC, failed to parse with error
# message: ValueError("for Period, please use 'Y-DEC' instead of 'YE-DEC'") |
Are period and period_index being deprecated or not? I've gone through the various issues on this topic and do not have a good idea on where the core devs stand on this issue. If I start a new project today should I avoid period? |
Period and PeriodIndex are not deprecated and unlikely to be. |
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
NA
Question about pandas
Background
There have been several issues raised related to the
Period
types, such as:And the latter deprecation of the business-day period has already been implemented.
This is of course related to the enhancement allowing non-ns units to datetimes, see e.g.:
Questions
Given the fact that deprecations of
Period
types have already begun, it would be useful to understand what the expected roadmap forPeriod
is. Is it expected that it will be removed as suggested in DEPR: Period, PeriodFoo #54235? Is there a plan for how to move forward on the "sticking points" listed there (especially the missing units: Week-with anchor, quarter, Year-with-anchor)?The current deprecation of business day periods in DEPR: Period[B] #53446 now requires bifurcation of code from
Period
toTimestamp
in the special case of business days - if you are usingPeriod
s, you can't wholesale switch toTimestamp
s yet (at least if you have e.g. Quarters), but you also can no longer stick with justPeriod
s.Tentative request
To me, it would make sense to revert the deprecation of the BDay
Period
dtype until there is a more comprehensive roadmap forPeriod
types and a path to make a more wholesale switch. But my apologies if I missed something fundamental here about why that deprecation is important in itself.The text was updated successfully, but these errors were encountered: