Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPS: Make pytz an optional dependency #57045

Closed
wants to merge 34 commits into from

Conversation

mroeschke
Copy link
Member

This makes pandas return zoneinfo.ZoneInfo objects by default from string inputs

@mroeschke mroeschke added Timezones Timezone data dtype Dependencies Required and optional dependencies labels Jan 24, 2024
@jbrockmendel
Copy link
Member

might be easier to break into steps e.g. one PR just changing which UTC, one just changing FixedOffset, then the hard one for the general case.

@mroeschke
Copy link
Member Author

one PR just changing which UTC, one just changing FixedOffset

These are already changed (I think you changed it for 2.0), so now this is tackling the general case.

@mroeschke mroeschke added this to the 3.0 milestone Feb 1, 2024
return tz.localize(dt, is_dst=None)
except (pytz.AmbiguousTimeError, pytz.NonExistentTimeError) as err:
if isinstance(err, pytz.AmbiguousTimeError):
raise AmbiguousTimeError(str(err)) from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does raise...from even do here? Can't we just not catch them to begin with and get the same behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that we needed to replace raising pytz exceptions with our own versions elsewhere (when we try to infer DST) so here I'm also re-raising a pandas exception instead of a pytz one. Do you think that's overkill here?

Copy link
Member

@WillAyd WillAyd Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yea I am a little wary of doing this; by providing compat here we then are forcing pandas to add these exceptions to its API no? Over the long run I'm not sure that is worth it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make decisions about how to handle fold before this. IIUC zoneinfo doesn't have its own analogues to these exceptions bc it expects fold to always be there to resolve potential ambiguity (not sure about NonExistent). So if we are changing over to zoneinfo, then i expect that our scalar Timestamp (which has fold) to never face ambiguity, while DatetimeIndex (which doesnt have fold) will still have ambiguity. Ideally we'd get the scalar and vector ops to behave the same, but adding fold to DTI/DTA wont be trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by providing compat here we then are forcing pandas to add these exceptions to its API no?

Yeah, I guess since this is for 3.0 we can just change the exception type.

import pytz
from pytz.tzinfo import BaseTzInfo as _pytz_BaseTzInfo

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use import_optional_dependency here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure changed

@@ -45,7 +41,7 @@ from pandas._libs.tslibs.util cimport (

cdef int64_t NPY_NAT = get_nat()
cdef tzinfo utc_stdlib = timezone.utc
cdef tzinfo utc_pytz = pytz.utc
cdef tzinfo utc_pytz = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be pytz.utc if pytz else None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Changed

if tz is None:
return False

global utc_pytz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any performance impacts? Referencing a global from a cdef function like this feels like it would really slow things down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was copying this pattern from zoneinfo above but I don't think we need this since pytz.UTC is just a singleton that's always there

return tz.localize(dt, is_dst=None)
except (pytz.AmbiguousTimeError, pytz.NonExistentTimeError) as err:
if isinstance(err, pytz.AmbiguousTimeError):
raise AmbiguousTimeError(str(err)) from err
Copy link
Member

@WillAyd WillAyd Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yea I am a little wary of doing this; by providing compat here we then are forcing pandas to add these exceptions to its API no? Over the long run I'm not sure that is worth it

except zoneinfo.ZoneInfoNotFoundError:
zoneinfo = None # type: ignore[assignment]

import pytz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but I think still want import_optional_dependency here

@@ -147,8 +147,7 @@ def convert_pandas_type_to_json_field(arr) -> dict[str, JSONSerializable]:
# timezone.utc has no "zone" attr
field["tz"] = "UTC"
else:
# error: "tzinfo" has no attribute "zone"
field["tz"] = dtype.tz.zone # type: ignore[attr-defined]
field["tz"] = str(dtype.tz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't have good test coverage for this but are you sure that the string representation of dtype.tz is the same as the .zone attribute prior?

@@ -1003,7 +1010,7 @@ def test_dti_ambiguous_matches_timestamp(self, tz, use_str, box_cls, request):
mark = pytest.mark.xfail(reason="We implicitly get fold=0.")
request.applymarker(mark)

with pytest.raises(pytz.AmbiguousTimeError, match=dtstr):
with pytest.raises(errors.AmbiguousTimeError, match=dtstr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the comment above about not providing error compatability here would make testing more challenging. Maybe its something we can include as part of the tz fixture? May also be worth doing as a follow up instead of adding to this PR

@mroeschke
Copy link
Member Author

Closing in favor of #59089

@mroeschke mroeschke closed this Jun 25, 2024
@mroeschke mroeschke deleted the ref/tz/zoneinfo branch June 25, 2024 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Default to stdlib timezone objects instead of pytz
3 participants