-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPS: Make pytz an optional dependency #57045
Conversation
might be easier to break into steps e.g. one PR just changing which UTC, one just changing FixedOffset, then the hard one for the general case. |
These are already changed (I think you changed it for 2.0), so now this is tackling the general case. |
return tz.localize(dt, is_dst=None) | ||
except (pytz.AmbiguousTimeError, pytz.NonExistentTimeError) as err: | ||
if isinstance(err, pytz.AmbiguousTimeError): | ||
raise AmbiguousTimeError(str(err)) from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does raise...from even do here? Can't we just not catch them to begin with and get the same behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here is that we needed to replace raising pytz
exceptions with our own versions elsewhere (when we try to infer DST) so here I'm also re-raising a pandas exception instead of a pytz one. Do you think that's overkill here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yea I am a little wary of doing this; by providing compat here we then are forcing pandas to add these exceptions to its API no? Over the long run I'm not sure that is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make decisions about how to handle fold
before this. IIUC zoneinfo doesn't have its own analogues to these exceptions bc it expects fold to always be there to resolve potential ambiguity (not sure about NonExistent). So if we are changing over to zoneinfo, then i expect that our scalar Timestamp (which has fold) to never face ambiguity, while DatetimeIndex (which doesnt have fold) will still have ambiguity. Ideally we'd get the scalar and vector ops to behave the same, but adding fold to DTI/DTA wont be trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by providing compat here we then are forcing pandas to add these exceptions to its API no?
Yeah, I guess since this is for 3.0 we can just change the exception type.
pandas/_libs/tslibs/timezones.pyx
Outdated
import pytz | ||
from pytz.tzinfo import BaseTzInfo as _pytz_BaseTzInfo | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use import_optional_dependency
here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure changed
pandas/_libs/tslibs/timezones.pyx
Outdated
@@ -45,7 +41,7 @@ from pandas._libs.tslibs.util cimport ( | |||
|
|||
cdef int64_t NPY_NAT = get_nat() | |||
cdef tzinfo utc_stdlib = timezone.utc | |||
cdef tzinfo utc_pytz = pytz.utc | |||
cdef tzinfo utc_pytz = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be pytz.utc if pytz else None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Changed
pandas/_libs/tslibs/timezones.pyx
Outdated
if tz is None: | ||
return False | ||
|
||
global utc_pytz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any performance impacts? Referencing a global from a cdef function like this feels like it would really slow things down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was copying this pattern from zoneinfo above but I don't think we need this since pytz.UTC
is just a singleton that's always there
return tz.localize(dt, is_dst=None) | ||
except (pytz.AmbiguousTimeError, pytz.NonExistentTimeError) as err: | ||
if isinstance(err, pytz.AmbiguousTimeError): | ||
raise AmbiguousTimeError(str(err)) from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yea I am a little wary of doing this; by providing compat here we then are forcing pandas to add these exceptions to its API no? Over the long run I'm not sure that is worth it
except zoneinfo.ZoneInfoNotFoundError: | ||
zoneinfo = None # type: ignore[assignment] | ||
|
||
import pytz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but I think still want import_optional_dependency here
@@ -147,8 +147,7 @@ def convert_pandas_type_to_json_field(arr) -> dict[str, JSONSerializable]: | |||
# timezone.utc has no "zone" attr | |||
field["tz"] = "UTC" | |||
else: | |||
# error: "tzinfo" has no attribute "zone" | |||
field["tz"] = dtype.tz.zone # type: ignore[attr-defined] | |||
field["tz"] = str(dtype.tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't have good test coverage for this but are you sure that the string representation of dtype.tz
is the same as the .zone
attribute prior?
@@ -1003,7 +1010,7 @@ def test_dti_ambiguous_matches_timestamp(self, tz, use_str, box_cls, request): | |||
mark = pytest.mark.xfail(reason="We implicitly get fold=0.") | |||
request.applymarker(mark) | |||
|
|||
with pytest.raises(pytz.AmbiguousTimeError, match=dtstr): | |||
with pytest.raises(errors.AmbiguousTimeError, match=dtstr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the comment above about not providing error compatability here would make testing more challenging. Maybe its something we can include as part of the tz
fixture? May also be worth doing as a follow up instead of adding to this PR
Closing in favor of #59089 |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This makes pandas return
zoneinfo.ZoneInfo
objects by default from string inputs