-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: datetime64 column type fails parquet round trip #60774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report! Further investigations and PRs to fix are welcome. |
This seems to be a known issue in pyarrow. TLDR: arrow internally uses millis, micros and nanos as unit. We could always re-cast it to the right unit downstream in pd.read_parquet using the pandas metadata, but it's up for debate whether this fix should be made here or on pyarrow side. |
Not too familiar with datetime64 - is it true that there can be a loss in precision - that is, that second resolution can represent a wider span of datetimes than microsecond? If that is the case, I would stay away from casting here. |
@rhshadrach From the pull request of the issue mentioned by @snitish in pyarrow. The merged pr files show that pyarrow supports milli, micro, nano seconds. If seconds is given, then they are giving an error which is mentioned in the issue as it's getting converted to milli seconds I tested the same for below code by @bmwilly : df = pd.DataFrame(
{
"ts": pd.to_datetime(
["2021-01-01 00:00:00", "2021-01-01 00:00:01", "2021-01-01 00:00:02"]
).astype("datetime64[s]"),
}
)
df.to_parquet("/tmp/df.parquet")
df2 = pd.read_parquet("/tmp/df.parquet") And the only issue is with But if |
Thanks @Anurag-Varma. My guess is that if you're using |
Hi, I'd like to work on this issue. Please let me know if I can be assigned to work on this, or if there are any guidelines I should follow before proceeding. Thanks! |
@alherrera-cs - thanks for the interest, you're welcome to pick this up. We have contributor documentation: https://pandas.pydata.org/pandas-docs/dev/development/contributing.html I'd recommend reviewing this section in particular. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Then
gives
but
gives
Expected Behavior
I would expect
to give
Installed Versions
The text was updated successfully, but these errors were encountered: