Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert list array and non-list array to scalars #7862

Closed

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Oct 19, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

convert_array_to_scalar_vec can convert well for list array, but we will get Vec<Vec<ScalarValue>> for non-list array (primitive array).
It would be nice to have another conversion that returns nicely Vec<ScalarValue> for the non-list array. I would need this for #7835 too.

What changes are included in this PR?

convert_list_array_to_scalar_vec for ListArray
convert_non_list_array_to_scalar_vec for non-ListArray

The nested array for DistinctArrayAggAccumulator (update_batch) is removed so as to test.

Are these changes tested?

Doc test

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Oct 19, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review October 19, 2023 14:32
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @jayzhan211 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
@jayzhan211 jayzhan211 changed the title Minor: Convert list array and non-list array to scalars Convert list array and non-list array to scalars Oct 25, 2023
@jayzhan211 jayzhan211 marked this pull request as draft October 25, 2023 00:58
@jayzhan211 jayzhan211 force-pushed the convert-general-arr-to-scalar branch from e7fd4ca to c9f4ea1 Compare October 25, 2023 01:42
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed sqllogictest SQL Logic Tests (.slt) labels Oct 25, 2023
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Oct 25, 2023

Summary of changes

  1. Only accept ListArray for convert_list_array, only accept non-ListArray for convert_non_list
  2. Add nested list test for convert_list_array
  3. extend wrap_into_list for multiple arrays
  4. keep the nested list test for distinct array agg removed, since update_batch only convert-non-list cases.

@jayzhan211 jayzhan211 marked this pull request as ready for review October 25, 2023 12:44
@jayzhan211 jayzhan211 requested review from alamb and Weijun-H October 25, 2023 12:44
@alamb alamb marked this pull request as draft October 30, 2023 20:13
@alamb
Copy link
Contributor

alamb commented Oct 30, 2023

I think this is not waiting on any more feedback

@jayzhan211 jayzhan211 marked this pull request as ready for review November 11, 2023 08:01
@jayzhan211 jayzhan211 marked this pull request as draft November 17, 2023 13:37
@jayzhan211 jayzhan211 force-pushed the convert-general-arr-to-scalar branch from 542226d to 6b56441 Compare November 17, 2023 13:43
@jayzhan211
Copy link
Contributor Author

wait on #8253

@jayzhan211
Copy link
Contributor Author

wait on #8439

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 force-pushed the convert-general-arr-to-scalar branch from 29ac936 to aee4eff Compare December 13, 2023 14:22
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as ready for review December 13, 2023 14:26
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 requested a review from Weijun-H December 13, 2023 15:08
@jayzhan211
Copy link
Contributor Author

@alamb I think this is ready to go, thanks

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is looking quite close I think

/// Retrieve ScalarValue for each row in `array`
/// Retrieve `ScalarValue` for each row in `array`
///
/// Convert `ListArray` to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, second `Vec` is for elements in the list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 It might also help to explain why we need two different signatures. It took me a while to grok things too

Suggested change
/// Convert `ListArray` to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, second `Vec` is for elements in the list
/// Convert `ListArray` into a 2 dimensional to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows,
/// second `Vec` is for elements in the list.
///
/// See [`Self::convert_non_list_array_to_scalars`] for converting non Lists
///
/// This method is an optimization to unwrap nested ListArrays to nested Rust structures without
/// converting them twice

Comment on lines +2042 to +2043
/// Retrieve `ScalarValue` for each row in `array`
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems left over

Suggested change
/// Retrieve `ScalarValue` for each row in `array`
///

@@ -2067,30 +2071,78 @@ impl ScalarValue {
///
/// assert_eq!(scalar_vec, expected);
/// ```
pub fn convert_array_to_scalar_vec(array: &dyn Array) -> Result<Vec<Vec<Self>>> {
let mut scalars = Vec::with_capacity(array.len());
pub fn convert_list_array_to_scalar_vec<O: OffsetSizeTrait>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easier to use if it didn't mix generic and non generic code

Maybe something like

Suggested change
pub fn convert_list_array_to_scalar_vec<O: OffsetSizeTrait>(
pub fn convert_list_array_to_scalar_vec(
array: &dyn Array,
) -> Result<Vec<Vec<Self>>> {
if let Some(arr) = array.as_list_opt::<i32> {
Self::convert_list_array_to_scalar_vec_internal(arr)
} else if let Some(arr) = array.as_list_opt::<64> {
Self::convert_list_array_to_scalar_vec_internal(arr)
} else {
_internal_err!("Expected GenericListArray but found: {array:?}")
}
}

And then internally pass in the cast Array by changing

    fn convert_list_array_to_scalar_vec_internal<O: OffsetSizeTrait>(
        array: &dyn Array,
    ) -> Result<Vec<Vec<Self>>> {

to

    fn convert_list_array_to_scalar_vec_internal<O: OffsetSizeTrait>(
        array: &GenericListArray<O>,
    ) -> Result<Vec<Vec<Self>>> {

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@alamb
Copy link
Contributor

alamb commented Dec 17, 2023

(BTW sorry for being so nit picky on this PR -- I think ScalarValue is a key and important API of DataFusion so avoiding API churn (continually changing the same API over and over again) is important

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Dec 17, 2023

(BTW sorry for being so nit picky on this PR -- I think ScalarValue is a key and important API of DataFusion so avoiding API churn (continually changing the same API over and over again) is important

This comment is a bit too late...

And, this API is actually not breaking the previous one convert_array_to_scalar_vec, we can have three versions of them. convert_array_to_scalar_vec for the place that we can't ensure the type for array.

Anyway, this change is just improvement, not fixing a critical error, so I will close it.

@jayzhan211 jayzhan211 closed this Dec 17, 2023
@alamb
Copy link
Contributor

alamb commented Dec 18, 2023

I am sorry -- I feel really bad that we can't find more review bandwidth to match your contributions @jayzhan211 -- your efforts so far have been great and are really appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants