-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ScalarValue::List to store ArrayRef #7629
Conversation
dc280bf
to
73e78c0
Compare
a87d75b
to
8ce6e93
Compare
I plan to give this a look later today or tomorrow. Thanks @jayzhan211 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is an epic PR. I am sorry for the delay in reviewing. It is hard for me to find enough contiguous time to review large PRs like this.
I left some comments. Other than the todo!()
in PartialOrd
(which would result in a panic) I think this PR is basically ready to go.
datafusion/common/src/scalar.rs
Outdated
@@ -1653,42 +1619,66 @@ impl ScalarValue { | |||
Ok(array) | |||
} | |||
|
|||
fn iter_to_array_list( | |||
scalars: impl IntoIterator<Item = ScalarValue>, | |||
// This function does not contains nulls but empty array instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "function not contain nulls" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullBuffer for ListArray is None
in this function, you can't get NULL
in the array from this function, only an empty array is possible.
I tried to have one function for iter_to_array_list_without_nulls
and iter_to_array_list
but failed since one needs nulls, and another needs an empty array.
in_data.data_type() | ||
) | ||
array_agg: Vec<Vec<ScalarValue>>, | ||
) -> Result<Vec<Vec<Vec<ScalarValue>>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to document what the three levels of Vec mean semantically ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't understand either 😢 Let me take a chance to learn more about aggregate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the comment from merge_ordered_arrays for now. I think there might be some space to improve OrderSensitiveArrayAggAccumulator
, but it is better to do it in another PR.
@@ -424,59 +427,6 @@ fn scalar_values_error_serialization() { | |||
Some(vec![]), | |||
vec![Field::new("item", DataType::Int16, true)].into(), | |||
), | |||
// Should fail due to inconsistent types in the list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why were these tests removed? Do they pass now? If so, perhaps we can move them to the positive cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is not possible to build inconsistent types with ListArray, therefore I just removed the whole test.
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
96f6f7d
to
577612b
Compare
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 please mark this PR as ready for review when you are ready for me to give it another look. |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
// in_data is Vec<ScalarValue> where ScalarValue does not include ScalarValue::List | ||
for in_data in array_agg.into_iter() { | ||
let ordering = in_data.into_iter().map(|struct_vals| { | ||
if let ScalarValue::Struct(Some(orderings), _) = struct_vals { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we have 3 levels of Vec in orderings
is because we got Vec\<ScalarValue>
in Struct, but orderings
seems to have only one ScalarValue
, orderings.len() == 1
based on the test.
Is it reasonable to have >1 ScalarValue for orderings
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @mustafasrepo or @metegenez can offer some guidance here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is an epic PR and reflects a lot of work. Thank you for the contribution
// in_data is Vec<ScalarValue> where ScalarValue does not include ScalarValue::List | ||
for in_data in array_agg.into_iter() { | ||
let ordering = in_data.into_iter().map(|struct_vals| { | ||
if let ScalarValue::Struct(Some(orderings), _) = struct_vals { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @mustafasrepo or @metegenez can offer some guidance here
Thanks for your review! |
I merged this PR up from main and plan to merge it when it passes CI. I will also make a follow on PR to clean up the to/from proto serialization. |
Thanks again @jayzhan211 -- epic work. |
Awesome, thanks @jayzhan211 |
I believe we found another bug here: #7938 |
Which issue does this PR close?
Closes #7352
Closes #996
Rationale for this change
What changes are included in this PR?
Old ScalarValue::List is replaced with ListArray.
convert_array_to_scalar_vec
andscalars_to_list_array
are two core utilities function that transform between Vec and ListArray.Are these changes tested?
Are there any user-facing changes?