-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RecordBatchWriter
only creates stats for the first 32 columns; this prevents calling create_checkpoint
.
#2745
Comments
maybe duplicate of #2675 Please create a MRE |
"I would expect all columns to be included in column stats, or otherwise I would expect this behavior to be configurable." Collecting all columns, will explode the logs and checkpoints. If you do want this behavior you can set the numIndexedCols to -1 |
I'll give this a shot, thanks for the suggestion. I don't see how this would work, though. If you look at the lines of code that I linked, you'll see that it uses a value of 32 regardless of any configuration. That is what this issue is about.
Sorry, what is an MRE?
Just to reiterate, I do not see anywhere in the code that we read this configuration from the |
My bad got confused here, I am wondering why you are using the RecordBatchWriter though and not the normal write operation? |
Sorry @adamfaulkner-at for confusion i thought it was a direct MRE => |
Hey, I think my main problem with checkpointing was actually fixed in 0.18.2: #2627 I am sorry, I didn't realize that I wasn't on the most recent version. I'll give this a shot and report back or close this issue.
I did not realize that the normal write operation existed, and RecordBatchWriter seemed like what I wanted. I'll switch to using the write operation. |
Thanks for your help, this is definitely fixed in 0.18.2! It would be beneficial if the docs pointed users towards |
Environment
Delta-rs version: 0.18.1
Binding: Rust (?)
Environment: MacOS, Linux
Bug
Hi,
RecordBatchWriter
passes a constantDEFAULT_NUM_INDEX_COLS
(which has value 32) tocreate_add
(here)[https://github.com/delta-io/delta-rs/blob/main/crates/core/src/writer/record_batch.rs#L232]. This prevents us from later callingcreate_checkpoint
, as this function seems to assert that all columns were indexed.It feels like the number of indexed columns should be configurable at the table level. Alternatively, if we had some way to tell the RecordBatchWriter that we want to index all of the columns, we would be unblocked.
What happened:
RecordBatchWriter
produces a commit that only includes the first 32 columns in the table in column stats.What you expected to happen:
I would expect all columns to be included in column stats, or otherwise I would expect this behavior to be configurable.
At least, I would like to be able to create a checkpoint by calling
create_checkpoint
. Currently, this crashes with the following:(userPrimaryKey is the name of one of our table's columns).
How to reproduce it:
Simply write to a delta lake table using RecordBatchWriter and attempt to create a checkpoint by calling
create_checkpoint
.More details:
The text was updated successfully, but these errors were encountered: