Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support zoneinfo tzinfos #46425

Merged
merged 21 commits into from
Apr 18, 2022
Merged

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Mar 18, 2022

The tzlocal-handling code is effectively using the tzinfo/datetime API "correctly", so this adapts zoneinfo cases to go through that path.

We may be able to eek out some extra perf by combining tzlocal/zoneinfo checks or something. Just did the simplest thing here.

cc @pganssle note the comment about the cache in timezones.pyx L55. Is this dangerous and if so can you suggest an alternative?

Potentially closes #43516, haven't checked.

pandas/conftest.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

looks like one of the code checks doesn't like "backports.zoneinfo" in the deps file. guessing its the "." causing trouble

@pganssle
Copy link
Contributor

cc @pganssle note the comment about the cache in timezones.pyx L55. Is this dangerous and if so can you suggest an alternative?

It's certainly "dangerous" in the sense that your UTC-detection logic will fail even for UTC-like zones. I think the whole endeavor is a bit foolhardy, as there is no reliable, guaranteed interface for determining whether a given zone is UTC. Your heuristics will work in a great many cases, so "user cleared the cache and people are using ZoneInfo("UTC")" is just another one of those.

That said, str(some_zoneinfo_zone) returns the key parameter if it exists, so you could presumably detect this with isinstance(zone, ZoneInfo) and str(zone) == "UTC".

@@ -210,6 +217,8 @@ cdef inline bint is_fixed_offset(tzinfo tz):
return 1
else:
return 0
elif is_zoneinfo(tz):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why False when the other return values are all 0 or 1?

Also, this isn't quite right. You can tell if a ZoneInfo has a fixed offset by passing None to utcoffset; if it returns a value, that's the fixed offset.

This also works for pytz, dateutil.tz.tzoffset and datetime.timezone. It won't work for dateutil.tz.gettz("UTC") right now, but once the zoneinfo backport is merged, it will.

This whole function can probably be simplified to: return bool(tz.utcoffset(None) is not None), or (to handle the fixed offset tzfile cases for dateutil <= 2.8.2):

if treat_tz_as_dateutil(tz):
    if len(tz._trans_idx) == 0 and len(tz._trans_list) == 0:
        return 1
    else:
        return 0

return 0 if tz.utcoffset(None) is None else 1

Of course, even that will break when the next version of dateutil comes out, so you probably want some version pinning in place (like for python-dateutil < 3.0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why False when the other return values are all 0 or 1?

No good reason; I'll change it to match.

This whole function can probably be simplified to [...]

Thanks. I'll probably make a dedicated branch to both Do This Right and audit our usages of is_fixed_offset, which I don't think we're very consistent about.

@@ -41,7 +42,7 @@ cdef int64_t NPY_NAT = get_nat()
cdef tzinfo utc_stdlib = timezone.utc
cdef tzinfo utc_pytz = UTC
cdef tzinfo utc_dateutil_str = dateutil_gettz("UTC") # NB: *not* the same as tzutc()

cdef tzinfo utc_zoneinfo = ZoneInfo("UTC")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want this to be called unconditionally. Practically speaking, this will always exist, but it's not guaranteed by the ZoneInfo API. It should probably be possible to import this file without this succeeding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Is the failure mode something like "UTC" not being present in /usr/share/zoneinfo/?

The user-facing downside of not doing this here is things going through slightly slower code paths. Not the end of the world, but worth avoiding if it doesn't require too much gymnastics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or if the user is on Windows and doesn't have tzdata installed, which is actually probably a reasonably common failure mode — user doesn't care about tzdata because they're not using zoneinfo, then they import pandas and this constructor fails when trying to import the timezones module.

I imagine it's pretty easy to make the impact of this minimal. One way to do it:

cdef tzinfo utc_zoneinfo = None

cdef bool is_utc_zoneinfo(tzinfo tz):
    global utc_zoneinfo
    if utc_zoneinfo is None:
        try:
            utc_zoneinfo = ZoneInfo("UTC")
        except ZoneInfoNotFoundError:
            return False

    return tz is utc_zoneinfo

Presumably this function will get inlined wherever you call it, and in the "common case" where ZoneInfo("UTC") is easily imported it's going to be two identity checks instead of 1.

If you are very concerned with performance I'd probably be trying to lazy-import zoneinfo in general anyway, in which case you have more "off-roads" to improve performance for people who don't use zoneinfo, though hopefully in the not-too-distant future pandas will switch to using zoneinfo or pytz-deprecation-shim anyway, at which point you'll be fine with eagerly importing.

elif is_tzlocal(tz):
return _tz_convert_tzlocal_utc(val, tz, to_utc=True)
elif is_tzlocal(tz) or is_zoneinfo(tz):
return _tz_localize_using_tzinfo_api(val, tz, to_utc=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the default behavior, rather than something triggered only for zoneinfo objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im coming around to a similar opinion. Probably for a separate PR since that would involve a lot of code shuffling (attempts to unify the many places we do these checks have been stymied xref #46397, #46246)

@@ -2,6 +2,7 @@ from datetime import (
timedelta,
timezone,
)
from zoneinfo import ZoneInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unconditional import — do you not require support for Python < 3.8?

If so, you should presumably have some more complicated logic here and in the is_zoneinfo logic.

Something like this works well for "is this an instance of X" without forcing the user to import the module that contains X:

https://github.com/pganssle/pytz-deprecation-shim/blob/bf0174adfce698e280c623d17fe8e82961de0c48/src/pytz_deprecation_shim/helpers.py#L11-L30

https://github.com/pganssle/pytz-deprecation-shim/blob/bf0174adfce698e280c623d17fe8e82961de0c48/src/pytz_deprecation_shim/_common.py#L6-L13

Since obviously some object isn't going to be an instance of zoneinfo.ZoneInfo if the interpreter has never imported the zoneinfo module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unconditional import — do you not require support for Python < 3.8?

I think our minimum version is 3.8, but I'm on 3.9 locally and forgot when writing this branch. Will change to try/except.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no leave this

we support >= 3.8

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zoneinfo isnt stdlib until 3.9, so importing unconditionally would require making backports.zoneinfo a hard dependency (and having that in the CI dep file is causing problems AFAICT)

Copy link
Contributor

@pganssle pganssle Mar 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with backports.zoneinfo it needs to be conditional since they're in different namespaces, plus I really like the approach I took in the pytz deprecation shims for checking if a time zone is a pytz zone, where my checks never actually import pytz unless it's already been imported (since obviously if it's never been imported then I know the object isn't a pytz type). That may be overkill, but it's also not that hard to implement, so ⚖️ 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to take a closer look at the pytz deprecation shim. We're getting rid of a lot of pytz usage in 2.0 (#34916) and these days I keep finding ways we could simplify the code even more if we dropped pytz support altogether.

@jreback jreback added the Timezones Timezone data dtype label Apr 13, 2022
@jreback jreback added this to the 1.5 milestone Apr 13, 2022
@jbrockmendel
Copy link
Member Author

is the pre-commit failure real or a false-positive?

@mroeschke
Copy link
Member

is the pre-commit failure real or a false-positive?

I think it's a false positive as pre-commit.ci is being set up.

Looks pretty good; just needs a whatsnew entry.

Also these tests are running okay on the CI on all platforms? I see from the zoneinfo docs that tzdata might need to be installed if a platform doesn't have system tz data: https://docs.python.org/3/library/zoneinfo.html#data-sources

@jbrockmendel
Copy link
Member Author

Also these tests are running okay on the CI on all platforms?

Looks OK. I dont know how common the no-tzdata case is. i think we're OK punting on that and can revisit if/when someone opens an issue about it

@@ -481,7 +481,7 @@ Timedelta

Time Zones
^^^^^^^^^^
-
- Bug in :class:`Timestamp` constructor raising when passed a ``ZoneInfo`` tzinfo object (:issue:`46425`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be more of a enhancement that all Timstamp/DTI/etc can all accept ZoneInfo objects now?

@mroeschke
Copy link
Member

Looks OK. I dont know how common the no-tzdata case is. i think we're OK punting on that and can revisit if/when someone opens an issue about it

It was more of a question if our ci/deps/ files need to install tzdata to be sure these tests are being run, but it sounds like they are.

IMO users should be responsible for installing tzdata if they need for personal use cases.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @mroeschke comments though on whatsnew either way ok

@mroeschke mroeschke merged commit 4ffdab3 into pandas-dev:main Apr 18, 2022
@mroeschke
Copy link
Member

Thanks LGTM.

Yeah a follow up calling this out as an enhancement that all Timstamp/DTI/etc can all accept ZoneInfo objects now would be good.

@jbrockmendel jbrockmendel deleted the enh-zoneinfo branch April 18, 2022 14:24
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* ENH: support zoneinfo tzinfos

* add backports.zoneinfo to ci deps

* add backports.zoneinfo to pypy file

* py38 compat

* fix zoneinfo check

* fix check on py38

* mypy fixup

* mypy fixup

* fix tznaive acse

* fix fold

* whatnsew

* flesh out comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Error "no attribute 'total_seconds'" with tzlocal >= 3.0 ENH: support zoneinfo tzinfo objects
4 participants