-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] VariantArrayBuilder uses MetadataBuilder and ValueBuilder #8206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// Builder for the in progress variant value, temporarily owns the buffers | ||
/// from `array_builder` | ||
variant_builder: VariantBuilder, | ||
parent_state: ParentState<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: I finally figured out why my pathfinding PR couldn't get the variant array builder to work correctly when storing end-offsets instead of start-offsets: It's because I wasn't using a top-level parent state here, so rollbacks were unreliable. Out of pure luck, the unit test that validates nested rollbacks wrote the same number of bytes for the rolled back and finalized nested builders, and by storing the starting offset we got lucky to observe the correct byte slice in spite of the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated diagnosis: When storing starting offset, the extra bytes that failed to roll back were harmlessly "appended" to the previous row as padding after the "real" variant value. But when storing ending offset, the extra bytes were "prepended" to the next row, whose own value was thus wrongly ignored as padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, the important thing is -- adding this parent_state
here makes rollbacks reliable, and so it no longer matters whether the builder stores starting or ending offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! There are only two minor document typos. Thanks for the contribution!
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @scovich and @codephage2020 -- I think this looks very nice 👏
Which issue does this PR close?
VariantArrayBuilder
usesParentState
for simpler rollbacks #8205Rationale for this change
VariantArrayBuilder
had a very complex choreography with theVariantBuilder
API, that required lots of manual drop glue to deal with ownership transfers between it and theVariantArrayVariantBuilder
it delegates the actual work to. Rework the whole thing to use a (now-reusable)MetadataBuilder
andValueBuilder
, with rollbacks largely handled byParentState
-- just like the other builders in the parquet-variant crate.What changes are included in this PR?
Five changes (curated as five commits that reviewers may want to examine individually):
VariantArrayBuilder
can access it from the parquet-variant-compute crate.MetadataBuilder
reusable. Itsfinish
method appends the bytes of a new serialized metadata dictionary to the underlying buffer and resets the remaining builder state. The builder is thus ready to create a brand new metadata dictionary whose serialized bytes will also be appended to the underlying buffer once finished.VariantArrayBuilder
to useMetadataBuilder
andValueBuilder
, coordinated viaParentState
. This is the main feature of the PR and also the most complicated/subtle.VariantArrayBuilder
.Are these changes tested?
Existing variant array builder tests cover the change.
Are there any user-facing changes?
A lot of builder-related types and methods from the parquet-variant crate are now public.