-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(python): make sure we always write microsecond precision timestamps #1467
Comments
The goal is to get apache/arrow#35746 merged in by 13.0.0! |
@wjones127 I can give this a go, I ran into this issue while writing with Polars, should be straightforward and similar to the delta_arrow_schema_from_pandas logic, right? |
@wjones127 I was thinking this could work: main...ion-elgreco:delta-rs:fix/cast-timestamp-always-to-us-precision But after checking the PyArrow docs and testing what a RecordBatchReader does, it apparently doesn't cast the schema when you pass a different one. It will throw an error that the schema that is passed is not the same as the one that was expected, so it only works on pa.Table.. Do you have any idea how you can cast/read a recordBatch to a new schema? Also, one thing that won't be caught are all the fields or structs that contain nested date-times with different precision. This may possibly be the case for the method where the precision is fixed for pd.DataFrame. |
…iter/merge (#1820) # Description This ports some functionality that @stinodego and I had worked on in Polars. Where we converted a pyarrow schema to a compatible delta schema. It converts the following: - uint -> int - timestamp(any timeunit) -> timestamp(us) I adjusted the functionality to do schema conversion from large to normal when necessary, which is still needed in MERGE as workaround #1753. Additional things I've added: - Schema conversion for every input in write_deltalake/merge - Add Pandas dataframe conversion - Add Pandas dataframe as input in merge # Related Issue(s) - closes #686 - closes #1467 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
…iter/merge (delta-io#1820) This ports some functionality that @stinodego and I had worked on in Polars. Where we converted a pyarrow schema to a compatible delta schema. It converts the following: - uint -> int - timestamp(any timeunit) -> timestamp(us) I adjusted the functionality to do schema conversion from large to normal when necessary, which is still needed in MERGE as workaround delta-io#1753. Additional things I've added: - Schema conversion for every input in write_deltalake/merge - Add Pandas dataframe conversion - Add Pandas dataframe as input in merge - closes delta-io#686 - closes delta-io#1467 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
Description
Right now I think we are be accident. But PyArrow will change default to nanoseconds so we should specify this.
See: apache/arrow#35746
Use Case
Related Issue(s)
The text was updated successfully, but these errors were encountered: