-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support zoneinfo tzinfo objects #37654
Comments
Congratulations to @pganssle for passing the PEP for this. Implementing the support in Timestamp and covering it in tests is going to be a PITA, I think, but it definitely needs to be done. |
Just pointing out - you shouldn't have to "support" it, and you wouldn't have to do a single thing if y'all had listened to me for the past 5 years when I said you shouldn't be digging into the internals of your time zone providers. Whatever work is necessary for you to use the public |
Yeah, I might be wrong, but I also don't think the current approach of querying the internals is going to work well here. I don't think there is any decent way forward except ditch that and move to only using public interfaces. |
There's a lot of upside to this. The question is if we can do it without a major perf loss. |
I guess it doesn't matter? It's harder to maintain what you have now than it would be to re-implement time zone handling from scratch in a performant way (also something I've suggested in the past as a workable solution). Won't be long before people start complaining about the deprecation warnings from |
It would be nice if we could coincide the long overdue timezone refactor with defaulting to the standard library timezones objects instead of pytz: #34916 |
@mroeschke I would keep those separate. There are backwards-compatibility concerns with removing Best to do the refactor first, then drop |
@jorisvandenbossche said in #34916 that refactoring |
The breaking change in #34916 is about what type of tzinfo objects we use when presented with a string like "US/Eastern" or "UTC". ATM we default to pytz; i opended #34916 because i think we should default to stdlib/zoneinfo instead. This issue is about supporting zoneinfo objects, e.g. ATM:
Most or all of the relevant code is going to be in tslibs.timezones and tslibs.tzconversion. We should be able to support zoneinfo using only the public API, then can measure perf and evaluate options. |
@jbrockmendel UPDATE: Timestamp is too tightly coupled with timezone tslibs code to refactor the latter without the former (at least, I think so after a few attempts to do that). I'll try to tackle both at the same time and post a PR if my solution works, but it will probably take a couple weeks. Will post here anyway. |
Apologies for the long absence: I had some health issues. Nothing too serious, but productivity was shot to hell for months. Now that I'm feeling better and should fully recover, I'm self-assigning this. Hope to have something for review in two weeks. |
FYI, I suspect that this will become more important as time goes on. I am relatively close to finishing the integration of The sooner |
@GF-Huang you or anyone in the community are welcome to contribute to the 3000+ open issues |
@jreback To be fair, this is not exactly a trivial issue for a new contributor to address (I have spent many hours trying and I still don't have anything presentable), and it's also a seriously critical issue. In the not-too-distant future pandas will break when dateutil 3.0 is released. When pytz adopts zoneinfo, pandas itself will break. Obviously this is a volunteer project and no one owes anyone patches, but it's worth considering that a big part of the problem with getting this working comes from the undocumented and complicated way that pandas' internal time zone handling logic works. I've tried asking in the gitter for more details about the intended semantics of some of the relevant functions and no one answered (hopefully this doesn't mean that no one knows...). Of course, I don't know what that translates to, but if there's any way to raise the profile of this issue among the people with the relevant knowledge to fix it, that would be great. |
@pganssle i appreciate that's it's non trivial but we have very limited dedicated resources - we need someone to step up to do this |
I think it would be useful as well to post a summary of the issues you ran into in this topic, since not all devs and contributors are on gitter. Maybe someone can pick it up from there or give useful insights to make progress. |
cc @pandas-dev/pandas-core |
I'm planning to address this, likely as part of the same push that implements non-nano support. 4 things I need/want to get done first: #43930 (WIP), #41493 (want to get this in for 1.4 so deprecation can be enforced in 2.0; this is proving hard), using cython3.0 (it will include a cython implementation of the stdlib So best case unless someone gets to it before I do is Nov or Dec. |
@jbrockmendel If / when you do get around to it, feel free to send me an e-mail or something, I'd be happy to coordinate on this and share what I've done so far. |
For reference, the questions that @pganssle mentioned he asked on gitter were:
Not fully sure what you mean with "in its own local time" (since it's not know what the local time zone is), but the values for naive timestamps can be interpreted "as if" the timezone was UTC (so indeed an offset from 1970-01-01T00:00), but disregarding the timezone. For example, the naive datetime "January 1st 1970, 00h00" would be encoded as timestamp value 0.
|
Big picture: this is a vectorized analogue to a pytz tzinfo's
This is half "docstrings need improvement" and half "cython doesnt allow ndarray[datetime64]"
Not exactly. It's treated as an offset from 1970-01-01T00:00 that is also naive, so "its own local time" isn't relevant/meaningful.
The DatetimeArray/DatetimeIndex
Yah let's punt on this for now. Longer term, the more logic we can outsource to dateutil/zoneinfo/whatever the better. |
By converting input_datetime (string or datetime object) to a string before applying pandas' to_datetime, compatibility issues between pandas and zoneinfo are circumvented (pandas-dev/pandas#37654).
By converting input_datetime (string or datetime object) to a string before applying pandas' to_datetime, compatibility issues between pandas and zoneinfo are circumvented (pandas-dev/pandas#37654). + fix typos and remove duplicate code Fix typos Remove duplicate code
By converting input_datetime (string or datetime object) to a string before applying pandas' to_datetime, compatibility issues between pandas and zoneinfo are circumvented (pandas-dev/pandas#37654). + fix typos and remove duplicate code
By converting input_datetime (string or datetime object) to a string before applying pandas' to_datetime, compatibility issues between pandas and zoneinfo are circumvented (pandas-dev/pandas#37654). + fix typos and remove duplicate code
By converting input_datetime (string or datetime object) to a string before applying pandas' to_datetime, compatibility issues between pandas and zoneinfo are circumvented (pandas-dev/pandas#37654). + fix typos and remove duplicate code
ATM we support pytz, dateutil, and datetime.timezone tzinfos. Now that it is part of the stdlib py39, we should also support zoneinfo tzinfos.
The text was updated successfully, but these errors were encountered: