-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: add batch_size option to merge_columns #2896
feat!: add batch_size option to merge_columns #2896
Conversation
Closes #2893 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2896 +/- ##
==========================================
- Coverage 77.93% 77.92% -0.02%
==========================================
Files 231 231
Lines 70613 70617 +4
Branches 70613 70617 +4
==========================================
- Hits 55031 55026 -5
- Misses 12462 12466 +4
- Partials 3120 3125 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -78,7 +89,7 @@ impl Updater { | |||
write_schema, | |||
final_schema, | |||
finished: false, | |||
deletion_restorer: DeletionRestorer::new(deletion_vector, batch_size), | |||
deletion_restorer: DeletionRestorer::new(deletion_vector, legacy_batch_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we passing down the legacy batch size here, instead of batch_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeletionRestorer
uses the presence of this field to trigger a "v1 compatibility mode" if this is set (to match row group sizes). Otherwise the deletion restorer doesn't need to know the batch size.
I renamed the internal variable to legacy_batch_size
as well to emphasize this. However, maybe I should rename to row_group_size
or something better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That's fine as is. Thanks for the explanation.
There's a few other places where updater is used (e.g. merge_insert / add_columns) and we may want to review those paths as well.
This is technically a breaking change on the rust API and so I will mark it as such.
BREAKING CHANGE:
Fragment::updater
now accepts a newbatch_size
argument