-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implemented with_field() for FixedSizeListBuilder #5541
feat: implemented with_field() for FixedSizeListBuilder #5541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @istvan-fodor -- this looks like a nice API improvement to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @istvan-fodor -- I think this looks good to me. I had a small test organization suggestion but I don't think it is required to merge this PR
I don't think the CI failures are related to changes in this PR -- I filed #5544 to fix.
Thanks again
cc @TimDiekmann
assert_eq!( | ||
f.data_type(), | ||
values_data.data_type(), | ||
"DataType of field ({}) should be the same as the values_builder DataType ({})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// [[0, 1, 2], null, [3, null, 5], [6, 7, null]] | ||
builder.values().append_value(0); | ||
builder.values().append_value(1); | ||
builder.values().append_value(2); | ||
builder.append(true); | ||
builder.values().append_null(); | ||
builder.values().append_null(); | ||
builder.values().append_null(); | ||
builder.append(false); | ||
builder.values().append_value(3); | ||
builder.values().append_null(); | ||
builder.values().append_value(5); | ||
builder.append(true); | ||
builder.values().append_value(6); | ||
builder.values().append_value(7); | ||
builder.values().append_null(); | ||
builder.append(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to avoid repeating this same setup multiple times in the tests (and it would make it clear the tests share the same data)
for example
fn make_list_builder() -> FixedSizedListBuilder<i32> {
...
}
// [[0, 1, 2], null, [3, null, 5], [6, 7, null]] | |
builder.values().append_value(0); | |
builder.values().append_value(1); | |
builder.values().append_value(2); | |
builder.append(true); | |
builder.values().append_null(); | |
builder.values().append_null(); | |
builder.values().append_null(); | |
builder.append(false); | |
builder.values().append_value(3); | |
builder.values().append_null(); | |
builder.values().append_value(5); | |
builder.append(true); | |
builder.values().append_value(6); | |
builder.values().append_value(7); | |
builder.values().append_null(); | |
builder.append(true); | |
let builder = make_list_builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll go through the tests and rework the builders.
@alamb , not sure you saw my comment on line 328. Can you check? The new behavior of non-nullable fields raises an additional question about null fields and I wanted to make sure this is in line with the rest of the arrow builders. |
I could be missing something but I don't see a comment on line 328? Is it possible that it is pending and the review hasn't been submitted? |
@@ -248,9 +325,37 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
fn test_fixed_size_list_array_builder_finish_cloned() { | |||
fn test_fixed_size_list_array_builder_with_field() { | |||
let builder = make_list_builder(false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb , I had a question about this test in particular. With the new capability to set the field as a non-nullable we have the situation where the builder has a null element built like this:
builder.values().append_null();
builder.values().append_null();
builder.values().append_null();
builder.append(false);
If the field is set to non-nullable, this element will fail the null check. I wanted to check with you if this is logically right or not? Are we making a distinction between nulls in the values builder, where let's say 1 of the 3 elements of a particular fixed size list is null (which would be a valid fail on non-nullability) or nulls in the main builder (which is built up as 3 nulls + a append(false)
in the values builder). To me the second one is questionable and comes down to how you want this to work. If this test should not fail with the second complete null case, then I have to go back and revise the null check of the values builder and somehow control for null elements of the main builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic we apply for FixedSizeList and StructArray elsewhere is to permit "masked" nulls, this is because a null must always take up space. I am not sure if we want to replicate this logic here.
See https://docs.rs/arrow-array/latest/arrow_array/array/struct.FixedSizeListArray.html#method.try_new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field is set to non-nullable, this element will fail the null check. I wanted to check with you if this is logically right or not?
I think I would expect it to be consistent with whatever FixedSlizeListArray::try_new
did
Yes, that was the case. I just changed it to a regular comment |
Hello ! I'm very interested in this PR, I really need FixedSizeList columns and didn't find a workaround that makes the FixedSizeListBuilder create a column with an appropriate name. So I'm really looking forward for this to be accepted |
There is currently an ongoing discussion w.r.t null handling, in the short term you could possibly use the array constructors directly - https://docs.rs/arrow-array/latest/arrow_array/array/struct.FixedSizeListArray.html#method.try_new |
Thanks, but I really need a builder. Tried implementing it myself with no luck so far. I'm pretty surprised that the builder doesn't know its field name though. There should be a way to know right ? At least I'm hoping that with your pull request it will be fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you @istvan-fodor -- I think this is getting close. Some final touchups:
- Update the test for zero length lists (should not fail null check)
- Add a test for building a 2 element array where one elemnt is NULL (with a null element)
- (maybe) Another test showing the null behavior you are asking about
|
||
#[test] | ||
#[should_panic( | ||
expected = "field is nullable = false, but the values_builder contains null values" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem right -- the builder doesn't have null values, instead it has no values at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @alamb. this is not a 0 sized list. I created the make_list_builder
to construct builders (you recommended something similar when I first opened the PR). The two boolean parameters control whether to include a null in the builder and to include a particular fixed size list that has a null element.
); | ||
if !f.is_nullable() { | ||
assert!( | ||
values_data.null_count() == 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is valid to have a zero length array (which would also have no nulls, but the null count would be zero)
values_data.null_count() == 0, | |
values_data.null_count() == 0 || values_data.len() == 0, |
(same for the check below)
@@ -248,9 +325,37 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
fn test_fixed_size_list_array_builder_finish_cloned() { | |||
fn test_fixed_size_list_array_builder_with_field() { | |||
let builder = make_list_builder(false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field is set to non-nullable, this element will fail the null check. I wanted to check with you if this is logically right or not?
I think I would expect it to be consistent with whatever FixedSlizeListArray::try_new
did
@alamb @tustvold , I added some changes.
|
…n-fodor/arrow-rs into feature/fixed-size-list-field-1353
Thanks you for sticking with this @istvan-fodor 🙏 |
Which issue does this PR close?
Closes #1353
Rationale for this change
FixedSizeListBuilder
doesn't allow users to specify the inner field of elements and defaults to the following FieldRef:Arc::new(Field::new("item", values_data.data_type().clone(), true))
We would like to keep the default behavior as-is, but also allow users to specify use their own FieldRef.
What changes are included in this PR?
with_field
toFixedSizeListBuilder
, copied over fromGenericListBuilder
field: Option<FieldRef>
toFixedSizeListBuilder
GenericListBuilder
infinish()
andfinish_build()
with_field
usedwith_field
used with non-nullable field, expect panicwith_field
used with wrong type, expect panicAre there any user-facing changes?
Yes.
The
finish_cloned()
andfinish()
methods no longer use thebuild_unchecked()
method, but rely onbuild()
. If the build fails, the thread panics.