-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ScalarValue::Struct to ArrayRef #7893
Change ScalarValue::Struct to ArrayRef #7893
Conversation
ac9ffc3
to
e27d7f9
Compare
wait on #7862 |
e27d7f9
to
5a5a88d
Compare
5a5a88d
to
a006450
Compare
e57eca5
to
1c82d51
Compare
addf685
to
bd98b9a
Compare
let should_fail_on_seralize: Vec<ScalarValue> = vec![ | ||
// Should fail due to empty values | ||
ScalarValue::Struct( | ||
Some(vec![]), |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to round_trip_scalar_values
, since it is able to serialized
ScalarValue::try_from(&DataType::Struct(Fields::from(vec![
Field::new("a", DataType::Int32, true),
Field::new("a", DataType::Boolean, false),
])))
.unwrap(),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is no need to test serializing an empty array as it isn't a valid input anyways
c786df7
to
60f4d2a
Compare
@alamb Ready for review! |
@@ -5292,8 +5557,7 @@ mod tests { | |||
"| col |", | |||
"+---------------------------+", | |||
"| |", | |||
"| {a: , b: } |", | |||
"| {a: , b: {ba: , bb: }} |", | |||
"| {a: 1, b: } |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no way to construct StructArray like the left-hand side.
explain select struct(1, 2.3, 'abc'); | ||
---- | ||
logical_plan | ||
Projection: Struct({c0:Int64(1),c1:Float64(2.3),c2:Utf8("abc")}) AS struct(Int64(1),Float64(2.3),Utf8("abc")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test format
.into_iter() | ||
.map(|(name, scalar)| (Field::new(name, scalar.data_type(), false), scalar)) | ||
.unzip(); | ||
// Wrapper for ScalarValue::Struct that checks the length of the arrays, without nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Remove these two wrappers, no longer needed after changing to Scalar<T>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these still TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we haven't changed to Scalar yet.
let struct_type = DataType::Struct(Fields::from(fields)); | ||
let mut column_wise_ordering_values = vec![]; | ||
let num_columns = fields.len(); | ||
for i in 0..num_columns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be a better design for StructArray (previous design is based on old ScalarValue::Struct). I avoid changing the logic or data structure in this PR.
May benefit #8558?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2382,12 +2381,13 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
/// Return a `null` literal representing a struct type like: `{ a: bool }` | |||
// / Return a `null` literal representing a struct type like: `{ a: bool }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// / Return a `null` literal representing a struct type like: `{ a: bool }` | |
/// Return a `null` literal representing a struct type like: `{ a: bool }` |
let sv = ScalarValue::try_from_array(column, 0)?; | ||
ordering_columns_per_row.push(sv); | ||
} | ||
|
||
Ok(ordering_columns_per_row) | ||
} else { | ||
exec_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: internal_err
3f53ead
to
98afb20
Compare
Rebase |
e431707
to
a756a8b
Compare
@jayzhan211 -- is this PR ready for a review? |
Yes, it keeps getting conflicts, but I think you can take a first scan, unless the conflicts are critical |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
a756a8b
to
4088750
Compare
Rebase |
Sorry -- starting to look now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good @jayzhan211 -- thank you both for the PR as well as for sticking with it for so long
I had a few comments about how to improve the implementation by using arrow kernels, but I also think we could merge this as is and then implement those improvements as a follow on PR if you prefer.
Again, thank you for your patience.
@@ -323,20 +335,32 @@ impl Accumulator for OrderSensitiveArrayAggAccumulator { | |||
impl OrderSensitiveArrayAggAccumulator { | |||
fn evaluate_orderings(&self) -> Result<ScalarValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can file a follow on ticket to track this idea
let should_fail_on_seralize: Vec<ScalarValue> = vec![ | ||
// Should fail due to empty values | ||
ScalarValue::Struct( | ||
Some(vec![]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is no need to test serializing an empty array as it isn't a valid input anyways
); | ||
} | ||
|
||
let mut valid = BooleanBufferBuilder::new(arrays.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using https://docs.rs/arrow/latest/arrow/compute/kernels/concat/index.html here? I think you should be able to simply concat the arrays together without having to have special handling (and if concat doesn't support StructArray
we can potentially file a ticket upstream in arrow-rs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.into_iter() | ||
.map(|(name, scalar)| (Field::new(name, scalar.data_type(), false), scalar)) | ||
.unzip(); | ||
// Wrapper for ScalarValue::Struct that checks the length of the arrays, without nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these still TODO?
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Thanks @jayzhan211 -- this is looking great. There are a few more outstanding suggestions, but I think we could do them as follow on PRs -- shall I merge this one? |
Sure! |
Thanks again @jayzhan211 |
Which issue does this PR close?
Closes #7835
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?