Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checkpoint and timestamp bugs #351

Merged
merged 3 commits into from
Aug 12, 2021

Conversation

xianwill
Copy link
Collaborator

@xianwill xianwill commented Aug 5, 2021

Description

This PR fixes several blocking issues related to checkpoint data and schema correctness.

One especially important note is that I'm switching from nanoseconds to microseconds for the default arrow timestamp unit. Neither actually works well for read scenarios because the parquet timestamp could be one of various types (delta-io/delta#643), but for writes, microseconds definitely looks like the best bet for broad support and good precision.

Note: I'm merging this to the writer-map-support branch rather than main because the required arrow and parquet crate refs aren't ref'ed by datafusion yet.

Additional note: this also picks up @mosyp's fix to the fs backend that was merged to main in #376

@xianwill xianwill changed the title WIP checkpoint bug Fix checkpoint and timestamp bugs Aug 11, 2021
@xianwill xianwill requested review from mosyp and houqp August 11, 2021 22:30
@xianwill xianwill marked this pull request as ready for review August 11, 2021 22:30
@mosyp
Copy link
Contributor

mosyp commented Aug 12, 2021

@xianwill I had to edit your PR because you were picking a bug that is fixed in recent main.
I don't know what's going on but it seems like it's a broken PR. I've reset you're branch to the latest upstream and added your changes on top of it with a single commit but github PR shows diffs and commits that should not be here, as well as conflicts

@mosyp
Copy link
Contributor

mosyp commented Aug 12, 2021

It's a PR to delta-io:writer-map-support ugh. Too early, need my coffee :D

…point_bug

# Conflicts:
#	Cargo.lock
#	python/Cargo.toml
#	rust/Cargo.toml
#	rust/src/delta_arrow.rs
Copy link
Contributor

@mosyp mosyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've messed up with a PR. But bug fixes are looking good!

@mosyp mosyp merged commit d66239f into delta-io:writer-map-support Aug 12, 2021
mosyp added a commit that referenced this pull request Sep 14, 2021
* Bump arrow deps and bring map support to schema

* Fix datafustion deps

* Fix checkpoint and timestamp bugs (#351)

* post merge fixes

* Add tests for new checkpoint API

* Post merge from main

* Reverse integrate main to writer-map-support

* post merge fixes

* cargo fmt

* Fix checkpoint compatibility for remove fields (#427)

* Add datafusion PR link

Co-authored-by: Christian Williams <christianw@scribd.com>
Co-authored-by: xianwill <christianwilliams79@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants