-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support zoneinfo tzinfos #46425
Conversation
looks like one of the code checks doesn't like "backports.zoneinfo" in the deps file. guessing its the "." causing trouble |
It's certainly "dangerous" in the sense that your UTC-detection logic will fail even for UTC-like zones. I think the whole endeavor is a bit foolhardy, as there is no reliable, guaranteed interface for determining whether a given zone is UTC. Your heuristics will work in a great many cases, so "user cleared the cache and people are using That said, |
pandas/_libs/tslibs/timezones.pyx
Outdated
@@ -210,6 +217,8 @@ cdef inline bint is_fixed_offset(tzinfo tz): | |||
return 1 | |||
else: | |||
return 0 | |||
elif is_zoneinfo(tz): | |||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why False
when the other return values are all 0
or 1
?
Also, this isn't quite right. You can tell if a ZoneInfo
has a fixed offset by passing None
to utcoffset
; if it returns a value, that's the fixed offset.
This also works for pytz
, dateutil.tz.tzoffset
and datetime.timezone
. It won't work for dateutil.tz.gettz("UTC")
right now, but once the zoneinfo
backport is merged, it will.
This whole function can probably be simplified to: return bool(tz.utcoffset(None) is not None)
, or (to handle the fixed offset tzfile
cases for dateutil <= 2.8.2
):
if treat_tz_as_dateutil(tz):
if len(tz._trans_idx) == 0 and len(tz._trans_list) == 0:
return 1
else:
return 0
return 0 if tz.utcoffset(None) is None else 1
Of course, even that will break when the next version of dateutil
comes out, so you probably want some version pinning in place (like for python-dateutil < 3.0
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why False when the other return values are all 0 or 1?
No good reason; I'll change it to match.
This whole function can probably be simplified to [...]
Thanks. I'll probably make a dedicated branch to both Do This Right and audit our usages of is_fixed_offset, which I don't think we're very consistent about.
pandas/_libs/tslibs/timezones.pyx
Outdated
@@ -41,7 +42,7 @@ cdef int64_t NPY_NAT = get_nat() | |||
cdef tzinfo utc_stdlib = timezone.utc | |||
cdef tzinfo utc_pytz = UTC | |||
cdef tzinfo utc_dateutil_str = dateutil_gettz("UTC") # NB: *not* the same as tzutc() | |||
|
|||
cdef tzinfo utc_zoneinfo = ZoneInfo("UTC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want this to be called unconditionally. Practically speaking, this will always exist, but it's not guaranteed by the ZoneInfo
API. It should probably be possible to import this file without this succeeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Is the failure mode something like "UTC" not being present in /usr/share/zoneinfo/?
The user-facing downside of not doing this here is things going through slightly slower code paths. Not the end of the world, but worth avoiding if it doesn't require too much gymnastics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, or if the user is on Windows and doesn't have tzdata
installed, which is actually probably a reasonably common failure mode — user doesn't care about tzdata
because they're not using zoneinfo
, then they import pandas
and this constructor fails when trying to import the timezones
module.
I imagine it's pretty easy to make the impact of this minimal. One way to do it:
cdef tzinfo utc_zoneinfo = None
cdef bool is_utc_zoneinfo(tzinfo tz):
global utc_zoneinfo
if utc_zoneinfo is None:
try:
utc_zoneinfo = ZoneInfo("UTC")
except ZoneInfoNotFoundError:
return False
return tz is utc_zoneinfo
Presumably this function will get inlined wherever you call it, and in the "common case" where ZoneInfo("UTC")
is easily imported it's going to be two identity checks instead of 1.
If you are very concerned with performance I'd probably be trying to lazy-import zoneinfo
in general anyway, in which case you have more "off-roads" to improve performance for people who don't use zoneinfo
, though hopefully in the not-too-distant future pandas
will switch to using zoneinfo
or pytz-deprecation-shim
anyway, at which point you'll be fine with eagerly importing.
pandas/_libs/tslibs/tzconversion.pyx
Outdated
elif is_tzlocal(tz): | ||
return _tz_convert_tzlocal_utc(val, tz, to_utc=True) | ||
elif is_tzlocal(tz) or is_zoneinfo(tz): | ||
return _tz_localize_using_tzinfo_api(val, tz, to_utc=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the default behavior, rather than something triggered only for zoneinfo objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/_libs/tslibs/timezones.pyx
Outdated
@@ -2,6 +2,7 @@ from datetime import ( | |||
timedelta, | |||
timezone, | |||
) | |||
from zoneinfo import ZoneInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unconditional import — do you not require support for Python < 3.8?
If so, you should presumably have some more complicated logic here and in the is_zoneinfo
logic.
Something like this works well for "is this an instance of X" without forcing the user to import the module that contains X:
Since obviously some object isn't going to be an instance of zoneinfo.ZoneInfo
if the interpreter has never imported the zoneinfo
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unconditional import — do you not require support for Python < 3.8?
I think our minimum version is 3.8, but I'm on 3.9 locally and forgot when writing this branch. Will change to try/except.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no leave this
we support >= 3.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zoneinfo isnt stdlib until 3.9, so importing unconditionally would require making backports.zoneinfo a hard dependency (and having that in the CI dep file is causing problems AFAICT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with backports.zoneinfo
it needs to be conditional since they're in different namespaces, plus I really like the approach I took in the pytz
deprecation shims for checking if a time zone is a pytz
zone, where my checks never actually import pytz
unless it's already been imported (since obviously if it's never been imported then I know the object isn't a pytz
type). That may be overkill, but it's also not that hard to implement, so ⚖️ 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to take a closer look at the pytz deprecation shim. We're getting rid of a lot of pytz usage in 2.0 (#34916) and these days I keep finding ways we could simplify the code even more if we dropped pytz support altogether.
is the pre-commit failure real or a false-positive? |
I think it's a false positive as pre-commit.ci is being set up. Looks pretty good; just needs a whatsnew entry. Also these tests are running okay on the CI on all platforms? I see from the zoneinfo docs that tzdata might need to be installed if a platform doesn't have system tz data: https://docs.python.org/3/library/zoneinfo.html#data-sources |
Looks OK. I dont know how common the no-tzdata case is. i think we're OK punting on that and can revisit if/when someone opens an issue about it |
@@ -481,7 +481,7 @@ Timedelta | |||
|
|||
Time Zones | |||
^^^^^^^^^^ | |||
- | |||
- Bug in :class:`Timestamp` constructor raising when passed a ``ZoneInfo`` tzinfo object (:issue:`46425`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be more of a enhancement that all Timstamp/DTI/etc
can all accept ZoneInfo objects now?
It was more of a question if our IMO users should be responsible for installing tzdata if they need for personal use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @mroeschke comments though on whatsnew either way ok
Thanks LGTM. Yeah a follow up calling this out as an enhancement that all Timstamp/DTI/etc can all accept ZoneInfo objects now would be good. |
* ENH: support zoneinfo tzinfos * add backports.zoneinfo to ci deps * add backports.zoneinfo to pypy file * py38 compat * fix zoneinfo check * fix check on py38 * mypy fixup * mypy fixup * fix tznaive acse * fix fold * whatnsew * flesh out comment
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.The tzlocal-handling code is effectively using the tzinfo/datetime API "correctly", so this adapts zoneinfo cases to go through that path.
We may be able to eek out some extra perf by combining tzlocal/zoneinfo checks or something. Just did the simplest thing here.
cc @pganssle note the comment about the cache in timezones.pyx L55. Is this dangerous and if so can you suggest an alternative?
Potentially closes #43516, haven't checked.