-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix checkpoint and timestamp bugs #351
Conversation
e684571
to
46a5729
Compare
f8f077a
to
162d013
Compare
@xianwill I had to edit your PR because you were picking a bug that is fixed in recent main. |
It's a PR to |
…point_bug # Conflicts: # Cargo.lock # python/Cargo.toml # rust/Cargo.toml # rust/src/delta_arrow.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've messed up with a PR. But bug fixes are looking good!
* Bump arrow deps and bring map support to schema * Fix datafustion deps * Fix checkpoint and timestamp bugs (#351) * post merge fixes * Add tests for new checkpoint API * Post merge from main * Reverse integrate main to writer-map-support * post merge fixes * cargo fmt * Fix checkpoint compatibility for remove fields (#427) * Add datafusion PR link Co-authored-by: Christian Williams <christianw@scribd.com> Co-authored-by: xianwill <christianwilliams79@gmail.com>
Description
This PR fixes several blocking issues related to checkpoint data and schema correctness.
One especially important note is that I'm switching from nanoseconds to microseconds for the default arrow timestamp unit. Neither actually works well for read scenarios because the parquet timestamp could be one of various types (delta-io/delta#643), but for writes, microseconds definitely looks like the best bet for broad support and good precision.
Note: I'm merging this to the writer-map-support branch rather than main because the required arrow and parquet crate refs aren't ref'ed by datafusion yet.
Additional note: this also picks up @mosyp's fix to the fs backend that was merged to main in #376