-
Notifications
You must be signed in to change notification settings - Fork 1k
Support FixedSizeList
RowConverter
#7705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support FixedSizeList
RowConverter
#7705
Conversation
5e19043
to
4551310
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should he relatively straightforward to avoid needing to cast and to encode/decode the array directly. This would avoid adding two very heavy dependencies in the form of arrow-cast and by extension arrow-select.
Further, it is actually wasteful to encode fixed size lists in this way, we don't need to var-encode them, we can simply encode the values directly one after each other, much like we do for StructArray.
FixedSizeList
RowConverter
@tustvold thanks for the feedback! |
The test data doesn't contain 42 value.
Validate that null values are correctly masked out.
Add `DataType::FixedSizeList` support to `RowConverter`. This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion.
4551310
to
667cbdf
Compare
Updated to remove casts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @findepi -- I won't say I fully follow the logic but I did verify the tests and the validation performed and that this seems to follow the pattern of the other converters.
It might also be worth adding a benchmark in arrow/benches/row_format.rs
in case someone wants to try to optimize this code more in the future
} | ||
|
||
pub fn compute_lengths_fixed_size_list( | ||
tracker: &mut LengthTracker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pattern matching, I wonder why this doesn't use the same pattern as List
?
Why not
tracker: &mut LengthTracker, | |
lengths: &mut [usize], |
(I don't see anything wrong with this I am just curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the caller used to operate on lengths: &mut [usize]
and that's why list helper functions have this in their API
the caller has been migrated to tracker: &mut LengthTracker
, the helper functions for list hasn't been updated.
if this code was inline in the caller (as for eg structs), it would operate on LengthTracker
directly.
I didn't find an example with a normal list to follow. I'd suggest doing this as a follow-up. BTW with the row format so precisely documented, is the format itself set in stone, or subject to change? |
Makes sense
I don't know of any policy / discussion on this topic (so the answer is "I don't know"). Part of the rationale to document the Row Format was that it was a pretty tricky thing to make correct -- I don't remember any rationale about not changing it. I also don't know of any use as a long term interchange format |
Thanks again @findepi |
# Which issue does this PR close? none # Rationale for this change This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. # What changes are included in this PR? Add `DataType::FixedSizeList` support to `RowConverter`. # Are there any user-facing changes? No (cherry picked from commit d7fc416)
# Which issue does this PR close? none # Rationale for this change This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. # What changes are included in this PR? Add `DataType::FixedSizeList` support to `RowConverter`. # Are there any user-facing changes? No (cherry picked from commit d7fc416)
none This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. Add `DataType::FixedSizeList` support to `RowConverter`. No (cherry picked from commit d7fc416)
none This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion. Add `DataType::FixedSizeList` support to `RowConverter`. No (cherry picked from commit d7fc416)
Which issue does this PR close?
none
Rationale for this change
This is necessary to support DISTINCT and GROUP BY over fixed-sized arrays in DataFusion.
What changes are included in this PR?
Add
DataType::FixedSizeList
support toRowConverter
.Are there any user-facing changes?
No