-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Default to stdlib timezone objects instead of pytz #34916
Comments
i don’t think we easily change this as it’s a failrly large api change we would need to have an option for this instead |
Might be a reasonable version 2.0 break. But +1 to moving the default timezone to the stdlib version |
Since there is also the discussion of using ( |
+1 on incorporating |
Thoughts on doing this without deprecation in 2.0? |
+1. Might be even better if we make |
Is this still scheduled for pandas 2.0? |
Under discussion, not scheduled. |
Here's the related conversation from Django, which led to the deprecation of |
For UTC, it's already not using pytz:
The rest didn't happen in time, but it can happen for 3.0 (which should come out in a years' time, since releases are expected to be yearly now). TBH I'd be OK with the switch happening in a minor release as well, but we can discuss that
Um, thanks for the encouragement, mate? Not sure what you're trying to achieve here @Zac-HD |
We are mainly waiting to drop 3.8 sometime in the next few months so that we can jump straight to ZoneInfo without having to take a temporary dependency on backports.zoneinfo |
@jbrockmendel did you already have something in progress for removing pytz? I'd be all for just removing pytz in the next minor release without warning - it's wrong beyond 2038, so this could just be considered a bug fix: # The following two don't match
In [33]: datetime(2038, 6, 1, tzinfo=ZoneInfo('Europe/Paris')).astimezone(timezone.utc)
Out[33]: datetime.datetime(2038, 5, 31, 22, 0, tzinfo=datetime.timezone.utc)
In [34]: pd.Timestamp('2038-06-01').tz_localize('Europe/Paris').tz_convert('UTC')
Out[34]: Timestamp('2038-05-31 23:00:00+0000', tz='UTC')
# But for 2037, they do
In [35]: datetime(2037, 6, 1, tzinfo=ZoneInfo('Europe/Paris')).astimezone(timezone.utc)
Out[35]: datetime.datetime(2037, 5, 31, 22, 0, tzinfo=datetime.timezone.utc)
In [36]: pd.Timestamp('2037-06-01').tz_localize('Europe/Paris').tz_convert('UTC')
Out[36]: Timestamp('2037-05-31 22:00:00+0000', tz='UTC') |
Nope.
Hadn't realized that, but not too surprised. I'd be OK with removing it sooner than planned, but won't lose sleep if we wait until 3.0. |
Paul Ganssle has written a package to help deprecate pytz usage (given that the timezone objects have a different interface, switching to zoneinfo can also have breaking changes for people accessing our So another option that we could do on the short term is already start using this shim for pytz when zoneinfo is available, as a way to properly deprecate the usage of pytz. And then this could actually pave the way to dropping pytz as the default (or dropping support altogether), switching to zoneinfo in 3.0. (this was also brought up in the django discussion mentioned above (https://groups.google.com/g/django-developers/c/PtIyadoC-fI), although they did not end up using it out of the box (only recommending as transitional fallback to use manually), as far as I understand from a quick read of it) |
Is the suggestion to disallow pytz (raise?), detect it and swap in non-pytz, or just not default to it when we get a tz=somestr? The latter seems by far the easiest change on our end. |
I was under the impression it would be the latter; we would still support pytz timezones but just change our default (and pytz would become an optional dependency) |
I think it could make sense to eventually also drop support for pytz entirely, but that certainly doesn't have to happen initially, only changing the default. Adapting the migration plan that @pganssle initially proposed in the above mentioned django thread for our situation, possible stages could look like:
Step 1 is already done (#37654, PR #46425), and step 2 is possible but is less convenient in pandas' case (in most cases users either get a timezone automatically from importing some data (and here we have to choose a default tzinfo impl to use), or specify the timezone with a string in something like So I think the current questions we have to answer are mostly: do we first want to do step 3 (the shim for deprecating pytz specific features), or do we directly go for step 4 (switching the default)? And in either case, when do we do this? (already in 2.1, or only in 3.0?) And whether we also want to do step 5 (fully deprecating/removing pytz support) is something we can still decide later. There is actually also another issue about deprecating pytz: #46463 Personally, I don't think that many users will access the |
One other aspect is also the question if we think our current zoneinfo support is ready to be used by default, as this comes with quite a performance impact:
(for this example, around a 20x slowdown for converting naive to aware timestamps) I am not familiar enough with our timezone conversion code in cython to have a good idea how easy it would be to improve this. But I assume this slowdown is mostly because with zoneinfo, we actually use zoneinfo to do the conversion? (which means converting each value in a DatetimeArray to a datetime.datetime object, use zoneinfo to get the offset / localize, and convert back to our internal int representation, so even though implemented in cython, this is a for loop with python calls. While with pytz we don't actually use pytz element-by-element, but get the offsets and transitions info from the pytz timezone, and then do the conversions ourselves in an efficient cython loop). @jbrockmendel is that a somewhat correct assumption? |
Yes. We do the same with dateutil. zoneinfo intentionally doesn't expose them (and IIUC in the future dateutil won't either), so we'd have to get them from somewhere else if we want to avoid making the python calls. IIRC (big "if") when implementing zoneinfo support I was surprised at how good the performance was compared to the pytz/dateutil cases (not better, just not as bad as I had expected). I think |
maybe temporarily, but longer term (say, 3.0) I'm suggesting removing pytz completely |
I see a quite similar factor in performance difference for the default of
|
thanks for looking into this - that looks like a bit of a blocker, will see what can be done |
FWIW my recommendation for The only issue is where to get the data, but you can just take it from the same place(s) that |
I've been hesitant to go down this road because I don't want to be in the business of finding and parsing system timezone files. Is there a path where we can re-use the existing implementation in zoneinfo or something else? |
Any updates on this? |
i.e. when we get
pd.Timestamp.now("UTC")
we should return a Timestamp withdatetime.timezone.utc
rather thanpytz.UTC
. Similarly when we parse an ISO8601 datetime we should use tzinfo oftimezone(timedelta(seconds=val*60))
instead ofpytz.FixedOffset(val)
This isn't that hard to implement, but doing it breaks a couple dozen tests where we are currently checking for pytz objects. This would technically be an API change, so putting it up for discussion before implementing it.
The text was updated successfully, but these errors were encountered: