-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation logic for StructBuilder::finish #2413
Conversation
The validation should be done in each |
In the linked issue the expected behavior was for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you
} | ||
if !self.field_builders.iter().all(|x| x.len() == self.len) { | ||
panic!("Field_builders are of unequal lengths.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also verify the length of the null buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this logic in 29ff8a6
/// - the number of null items in individual field_builders are equal to null_count in self.null_buffer_builder. | ||
fn validate_and_construct(&mut self) -> (Vec<ArrayData>, Option<Buffer>) { | ||
if self.fields.len() != self.field_builders.len() { | ||
panic!("Number of fields is not equal to the number of field_builders."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we check this when constructing the struct builder, such as moving this check to StructBuilder::new()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too was thinking the same. But I made my code according to issues expected behavior. If the team needs it, I shall move this check to new function.
} | ||
|
||
if child_null_count != self_null_count { | ||
panic!("StructBuilder and field_builders contain unequal null counts."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't struct builder and child builders have different null counts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we say a struct is null, its child elements should be null. Hence they should have same null counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we say a struct is null, its child elements should be null.
Sorry, I'm afraid this is not the truth. Each array has its own null buffer, which means the validation of elements in different arrays are independent.
For example, let's say we have a struct array ["name": String, "age": Int]
. We could have elements whose fields could be None
:
[
["Name": Some("Bob"), "age": Some(32)],
["Name": Some("Andy"), "age": None],
["Name": None, "age": None],
]
Also, we could have a null
field in the Struct array:
[
Some(["Name": Some("Bob"), "age": Some(32)]),
None, // the underlying "Name" and "age" fields could be anything. Because they are masked by the null buffer.
Some(["Name": None, "age": None]),
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In array::data::tests::test_try_new_sliced_struct , there was this below code and thats why I had understood it that way.
// struct = { a: int, b: bool }
// struct[1] = null
builder
.field_builder::<Int32Builder>(0)
.unwrap()
.append_option(None);
builder
.field_builder::<BooleanBuilder>(1)
.unwrap()
.append_option(None);
builder.append(false);
So I added this logic. My bad,I had understood it wrong. I shall remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check should be that the bit length of null_buffer_builder is the same as the length of the inner builders, the null counts can and likely will differ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if the latest commit looks good ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Leave some nits.
if !self | ||
.field_builders | ||
.iter() | ||
.all(|x| x.len() == self.null_buffer_builder.len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have checked
self.field_builders.iter().all(|x| x.len() == self.len)
before, we don't need to check
self .field_builders.iter().all(|x| x.len() == self.null_buffer_builder.len())
Based on the transitivity of integer equality, checking
self.len == self.null_buffer_builder.len()
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait! We don't need to check the null buffer length at all. As we can make sure that the length of null buffer builder is always equal to self.len
. @tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah yes, does beg the question why we even have self.len
as a separate variable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #2429 to track this.
|
||
let mut builder = StructBuilder::new(fields, field_builders); | ||
builder.finish(); | ||
} |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Could we merge this? And then we can do some clean-up. |
Benchmark runs are scheduled for baseline = 98eeb01 and contender = f0d7d0b. f0d7d0b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2252.
Rationale for this change
StructBuilder::finish
callsArrayDataBuilder::build_unchecked
without verifying the length of the child arrays. This allows forming invalidArrayData
What changes are included in this PR?
I have added a new private function
StructBuilder::validate_content
which is called first before running the finish logic. This validates contents in the struct to ensure that fields and field_builders are of equal length and the number of items in individual field_builders are equal to self.lenAre there any user-facing changes?
No, Since I have used panics to handle the errors.