-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added timezone type to dfs when the corresponding pd Df also has timezones #1954
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall change looks good to me. Just need elaboration on one change
tz_columns = [ | ||
str(c).replace('"', '""') for c in df.columns if pandas.api.types.is_datetime64tz_dtype(df[c]) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this change is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as explained in the comment starting at line 352
# if the column name contains a double quote, we need to escape it by replacing with two double quotes |
Didn‘t manage to install tox on my device, so some checks might be failing :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested this with your changes and I'm afraid this is not enough to fix the issue. For example if you create dataframe like so
df = pd.DataFrame({"DT": [
datetime.now(tz=pytz.timezone("Europe/Amsterdam")),
datetime.now(tz=pytz.timezone("UTC")),
]})
print("is tz type =", pd.api.types.is_datetime64tz_dtype(df["DT"]))
the result here is False
so we will miss this case. Even if this is fixed, I notice that we are not correctly reading timezone information from the parquet file. I'll check with internal team and update this.
Isn't that to be expected? Aren't types per column and hence it cannot be done correclty anyways. But I think it gets handled correctly when saving to parquet:
This gives the following output: |
When adding a similar example as yours to the unittest, then everything works when setting use_logical_type=True. But not when it is None or False, since then the check here fails. Hence this check probably needs adaptation. |
… adapted test such that we have a case with two different timezones
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1444940: Write to pandas with datetime with timezone has no timezone in the resulting dataframe #1952
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Checks if there are any tz_columns and changes the column mapping such that the resulting type is a timezone type