Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ScalarValue::Struct to ArrayRef #7893

Merged
merged 12 commits into from
Feb 7, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Closes #7835

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 21, 2023
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from ac9ffc3 to e27d7f9 Compare November 12, 2023 02:27
@github-actions github-actions bot removed logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Nov 12, 2023
@jayzhan211
Copy link
Contributor Author

wait on #7862

@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from e27d7f9 to 5a5a88d Compare December 9, 2023 11:13
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from 5a5a88d to a006450 Compare December 14, 2023 12:37
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 16, 2023
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch 2 times, most recently from e57eca5 to 1c82d51 Compare December 18, 2023 00:21
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch 2 times, most recently from addf685 to bd98b9a Compare January 6, 2024 06:19
let should_fail_on_seralize: Vec<ScalarValue> = vec![
// Should fail due to empty values
ScalarValue::Struct(
Some(vec![]),

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to round_trip_scalar_values, since it is able to serialized

        ScalarValue::try_from(&DataType::Struct(Fields::from(vec![
            Field::new("a", DataType::Int32, true),
            Field::new("a", DataType::Boolean, false),
        ])))
        .unwrap(),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is no need to test serializing an empty array as it isn't a valid input anyways

@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch 2 times, most recently from c786df7 to 60f4d2a Compare January 10, 2024 13:27
@jayzhan211 jayzhan211 marked this pull request as ready for review January 10, 2024 13:48
@jayzhan211
Copy link
Contributor Author

@alamb Ready for review!

@@ -5292,8 +5557,7 @@ mod tests {
"| col |",
"+---------------------------+",
"| |",
"| {a: , b: } |",
"| {a: , b: {ba: , bb: }} |",
"| {a: 1, b: } |",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no way to construct StructArray like the left-hand side.

explain select struct(1, 2.3, 'abc');
----
logical_plan
Projection: Struct({c0:Int64(1),c1:Float64(2.3),c2:Utf8("abc")}) AS struct(Int64(1),Float64(2.3),Utf8("abc"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test format

.into_iter()
.map(|(name, scalar)| (Field::new(name, scalar.data_type(), false), scalar))
.unzip();
// Wrapper for ScalarValue::Struct that checks the length of the arrays, without nulls
Copy link
Contributor Author

@jayzhan211 jayzhan211 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Remove these two wrappers, no longer needed after changing to Scalar<T>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we haven't changed to Scalar yet.

let struct_type = DataType::Struct(Fields::from(fields));
let mut column_wise_ordering_values = vec![];
let num_columns = fields.len();
for i in 0..num_columns {
Copy link
Contributor Author

@jayzhan211 jayzhan211 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a better design for StructArray (previous design is based on old ScalarValue::Struct). I avoid changing the logic or data structure in this PR.

May benefit #8558?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a better design for StructArray (previous design is based on old ScalarValue::Struct). I avoid changing the logic or data structure in this PR.

May benefit #8558?

I don't think it will benefit #8558 (won't harm either). However it will be a better change anyway.

@@ -2382,12 +2381,13 @@ mod tests {
Ok(())
}

/// Return a `null` literal representing a struct type like: `{ a: bool }`
// / Return a `null` literal representing a struct type like: `{ a: bool }`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// / Return a `null` literal representing a struct type like: `{ a: bool }`
/// Return a `null` literal representing a struct type like: `{ a: bool }`

let sv = ScalarValue::try_from_array(column, 0)?;
ordering_columns_per_row.push(sv);
}

Ok(ordering_columns_per_row)
} else {
exec_err!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: internal_err

@jayzhan211 jayzhan211 requested a review from alamb January 13, 2024 02:10
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from 3f53ead to 98afb20 Compare January 29, 2024 11:56
@jayzhan211
Copy link
Contributor Author

Rebase

@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from e431707 to a756a8b Compare January 30, 2024 14:14
@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

@jayzhan211 -- is this PR ready for a review?

@jayzhan211
Copy link
Contributor Author

@jayzhan211 -- is this PR ready for a review?

Yes, it keeps getting conflicts, but I think you can take a first scan, unless the conflicts are critical

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from a756a8b to 4088750 Compare February 5, 2024 13:15
@jayzhan211
Copy link
Contributor Author

Rebase

@alamb
Copy link
Contributor

alamb commented Feb 5, 2024

Sorry -- starting to look now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good @jayzhan211 -- thank you both for the PR as well as for sticking with it for so long

I had a few comments about how to improve the implementation by using arrow kernels, but I also think we could merge this as is and then implement those improvements as a follow on PR if you prefer.

Again, thank you for your patience.

@@ -323,20 +335,32 @@ impl Accumulator for OrderSensitiveArrayAggAccumulator {
impl OrderSensitiveArrayAggAccumulator {
fn evaluate_orderings(&self) -> Result<ScalarValue> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can file a follow on ticket to track this idea

datafusion/proto/src/logical_plan/to_proto.rs Show resolved Hide resolved
let should_fail_on_seralize: Vec<ScalarValue> = vec![
// Should fail due to empty values
ScalarValue::Struct(
Some(vec![]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is no need to test serializing an empty array as it isn't a valid input anyways

);
}

let mut valid = BooleanBufferBuilder::new(arrays.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using https://docs.rs/arrow/latest/arrow/compute/kernels/concat/index.html here? I think you should be able to simply concat the arrays together without having to have special handling (and if concat doesn't support StructArray we can potentially file a ticket upstream in arrow-rs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
.into_iter()
.map(|(name, scalar)| (Field::new(name, scalar.data_type(), false), scalar))
.unzip();
// Wrapper for ScalarValue::Struct that checks the length of the arrays, without nulls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still TODO?

datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@alamb
Copy link
Contributor

alamb commented Feb 6, 2024

Thanks @jayzhan211 -- this is looking great. There are a few more outstanding suggestions, but I think we could do them as follow on PRs -- shall I merge this one?

@jayzhan211
Copy link
Contributor Author

Thanks @jayzhan211 -- this is looking great. There are a few more outstanding suggestions, but I think we could do them as follow on PRs -- shall I merge this one?

Sure!

@alamb alamb merged commit 8413da8 into apache:main Feb 7, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 7, 2024

Thanks again @jayzhan211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change ScalarValue::Struct to store ArrayRef
4 participants