Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address time zone localization issue #368

Merged
merged 4 commits into from
May 8, 2023

Conversation

john-bodley
Copy link
Contributor

@john-bodley john-bodley commented Apr 30, 2023

Description

This PR addresses #366.

Per this article, there seemed to be merit in removing pytz in favor of dateutil.tz given that in Python 3.6 the datetime module remedied the ambiguous datetimes due to daylight saving time transition.

Regrettably though one cannot obtain the IANA name from a dateutil.tz time zone—which is required when transpiling datetime.time/datetime.datetime objects to Trino SQL when provided as operation parameters. Instead we use the zoneinfo package (added to Python’s standard library in Python 3.9 and back ported) which does provide a mechanism of obtaining the specified IANA name from a datetime.datetime/datetime.time object, i.e.,

>>> from datetime import time
>>> from zoneinfo import ZoneInfo
>>>
>>> midnight = time(0, tzinfo=ZoneInfo("America/Los_Angeles"))
>>> midnight.tzinfo.key 
'America/Los_Angeles'

The fix is rather simple as simply switching out the pytz package for the zoneinfo package remedies the issue:

Before

>>> from datetime import datetime
>>> import pytz
>>>
>>> datetime(2023, 1, 1, tzinfo=pytz.timezone("America/Los_Angeles")).isoformat()
'2023-01-01T00:00:00-07:53' 

After

>>> from datetime import datetime
>>> import zoneinfo
>>>
>>> datetime(2023, 1, 1, tzinfo=zoneinfo.ZoneInfo("America/Los_Angeles")).isoformat()
'2023-01-01T00:00:00-08:00'

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

* Fixes time zone localization. The `pytz` package for encoding/decoding time zones has been deprecated in favor of `zoneinfo`. Time zones should now be encoded (when specifying execution parameters) using the `zoneinfo.ZoneInfo` method as opposed to `pytz.timezone` method. ({[issue](https://github.com/trinodb/trino-python-client/issues/366.)}`366`)



def query_time_with_timezone(trino_connection, tz_str: str):
@pytest.mark.parametrize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as before but more aligned with how pytest was intended to be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make preparatory commits for optimisations not related to the change.

from zoneinfo import ZoneInfo # type: ignore

except ModuleNotFoundError:
from backports.zoneinfo import ZoneInfo # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously there was no requirement in setup.py for the backports.zoneinfo package.

@john-bodley john-bodley changed the title fix: Address timezone localization issue fix: Address time zone localization issue May 1, 2023
@@ -300,114 +306,114 @@ def query_time_with_timezone(trino_connection, tz_str: str):
# min supported time(3)
.add_field(
sql=f"TIME '00:00:00 {tz_str}'",
python=time(0, 0, 0).replace(tzinfo=tz))
python=time(0, 0, 0, tzinfo=tz))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearer/cleaner to specify the time zone when creating the object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any other occurrences of this in production code as well?

Copy link
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM



def query_time_with_timezone(trino_connection, tz_str: str):
@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make preparatory commits for optimisations not related to the change.

@john-bodley
Copy link
Contributor Author

@mdesmet I've addressed your comments.

@john-bodley
Copy link
Contributor Author

@aalbu or @hashhar would either of you mind merging this given than @mdesmet has approved the change?

Additionally what would be a rough estimate (days, weeks, or months) before this change would be released? The reason I'm asking is we likely need to give our Superset users—which leverages the Trino DB-API—a heads up of the issue and would ideally like to provide them with an rough ETA as to when it will be fixed.

@hashhar
Copy link
Member

hashhar commented May 3, 2023

I'll take a look tomorrow. I'm leaving some quick editorial comments for now.

Since this is a correctness issue I'd like to do a release as soon as we have this merged.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "[preparatory] Use @pytest.mark.parameterize".

The project avoids prefixing commit messages, so something just like Parameterize time and timestamp with time zone tests is good enough.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit "[preparatory] Specify tzinfo during datetime/time object creation" -> "Specify tzinfo during datetime/time creation in tests"

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit " [fix] Address timezone localization issue ".

Can you add a short explanation in the commit message body about the issue? Alternatively I think it'd also be self-explanatory if you can add a test which showcases the issue and then in next commit add the fix and adjust the test.

@@ -78,7 +78,12 @@
"Topic :: Database :: Front-Ends",
],
python_requires='>=3.7',
install_requires=["pytz", "requests", "tzlocal"],
install_requires=[
"backports.zoneinfo;python_version<'3.9'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing zoneinfo require should be a separate commit, it's not related to the fix itself.

@@ -1041,7 +1037,7 @@ def new_instance(self, value: datetime, fraction: Decimal) -> TimestampWithTimeZ
return TimestampWithTimeZone(value, fraction)

def normalize(self, value: datetime) -> datetime:
if isinstance(self._whole_python_temporal_value.tzinfo, BaseTzInfo):
if tz.datetime_ambiguous(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

@john-bodley john-bodley changed the title fix: Address time zone localization issue Address time zone localization issue May 3, 2023
@john-bodley
Copy link
Contributor Author

@hashhar I've addressed your comments.

Consider the following example using the canonical way to add zones to
datetime objects:

    >>> import pytz
    >>> import datetime
    >>> import zoneinfo
    >>> datetime.datetime(2023, 1, 1, tzinfo=pytz.timezone("America/Los_Angeles")).isoformat()
    '2023-01-01T00:00:00-07:53'
    >>> datetime.datetime(2023, 1, 1, tzinfo=zoneinfo.ZoneInfo("America/Los_Angeles")).isoformat()
    '2023-01-01T00:00:00-08:00'

pytz does eager timezone evaluation and uses the local-mean-time since
the instant in time is not known. It requires an additional `localize`
call to get the correct zone like so:

    >>> pytz.timezone("America/Los_Angeles").localize(datetime.datetime(2023, 1, 1)).isoformat()
    '2023-01-01T00:00:00-08:00'

This increases chances of introducing bugs when writing idiomatic
Python.

The only reason to use pytz was because it allowed to control what
happens with ambiguous datetimes but the standard library also allows
provides control over that since 3.9 (and is available as
backports.zoneinfo for older versions).
@hashhar hashhar force-pushed the john-bodley--fix-issue-366 branch from cbd4c6b to 7ff12c7 Compare May 8, 2023 13:03
@hashhar hashhar merged commit 2b9ca0c into trinodb:master May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants