Skip to content

Conversation

@vegarsti
Copy link
Contributor

@vegarsti vegarsti commented Nov 1, 2025

Which issue does this PR close?

Rationale for this change

We want to be able to reverse a ListView.

What changes are included in this PR?

  • Downcast &dyn Array to ListView: as_list_view_array
  • Downcast &dyn Array to LargeListView: as_large_list_view_array
  • Branches in array_reverse_inner to reverse ListView and LargeListView
  • Main logic in list_view_reverse which materializes a new values array using take

Are these changes tested?

Yes

@github-actions github-actions bot added the common Related to common crate label Nov 1, 2025
}

// Take values from underlying array in the reversed order
let indices_array: ArrayRef = match size_of::<O>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, but I wanted to 1) avoid unwraps 2) don't use 64 if 32 was sufficient.

Copy link
Contributor Author

@vegarsti vegarsti Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use O::IS_LARGE but still wondering about the approach

)?))
}

fn list_view_reverse<O: OffsetSizeTrait + TryFrom<i64>>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I did not spend much time looking at general_array_reverse, maybe I should have... It constructs a MutableArrayData and operates on it, while this uses take. There might be a good reason why general_array_reverse doesn't use take?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using take makes sense to me here. If you have time you could try both approaches and run a benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will try that

Copy link
Contributor Author

@vegarsti vegarsti Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I tried the benchmark here #18425 with array_len of 10k (it's 100k on the branch), using MutableData is way worse 🫨 Baseline is the code on this PR, the code from this snippet is here.

array_reverse           time:   [44.858 µs 44.865 µs 44.874 µs]
                        change: [+545.75% +547.46% +549.17%] (p = 0.00 < 0.05)
                        Performance has regressed.

This indicates it might be worth using take instead of MutableData on the regular array one too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to go with take in that case

@vegarsti

This comment was marked as outdated.

@vegarsti vegarsti marked this pull request as draft November 1, 2025 08:06
@vegarsti vegarsti marked this pull request as ready for review November 1, 2025 18:37
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, have some suggestions for refactoring.

I think its unavoidable to reshuffle/materialize the underlying child array given my understanding of ListView arrays.

Comment on lines +240 to +255
// Materialize values from underlying array with take
let indices_array: ArrayRef = if O::IS_LARGE {
Arc::new(arrow::array::UInt64Array::from(
indices
.iter()
.map(|i| i.as_usize() as u64)
.collect::<Vec<_>>(),
))
} else {
Arc::new(UInt32Array::from(
indices
.iter()
.map(|i| i.as_usize() as u32)
.collect::<Vec<_>>(),
))
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I do wonder if there is a better way to do this, will keep thinking 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this feels cumbersome!

)?))
}

fn list_view_reverse<O: OffsetSizeTrait + TryFrom<i64>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using take makes sense to me here. If you have time you could try both approaches and run a benchmark?

github-merge-queue bot pushed a commit that referenced this pull request Nov 3, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When reviewing #18424 I noticed some refactoring that could be applied
to existing array reverse implementation.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

See my comments for the refactors & justifications.

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
// even if the original array had elements out of order.
let mut indices: Vec<O> = Vec::with_capacity(values.len());
let mut new_sizes = Vec::with_capacity(sizes.len());
let mut new_offsets: Vec<O> = Vec::with_capacity(offsets.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW you can convert the existing indices to a Vec for in place update as well -- I tried to document that a bit more in apache/arrow-rs#8771

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 3, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

I think @alamb might have confused what this reverse function is doing

)?))
}

fn list_view_reverse<O: OffsetSizeTrait + TryFrom<i64>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to go with take in that case

}

#[test]
fn test_reverse_list_view_empty() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have a test case where its all nulls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Done in 7d39f69

mbrobbel pushed a commit to apache/arrow-rs that referenced this pull request Nov 4, 2025
# Which issue does this PR close?

- related to apache/datafusion#18424

# Rationale for this change

It may not be obvious how to convert certain Arrow arrays to/from Vec
without copying for manipulation, so let's add an example

# What changes are included in this PR?

1. Add note about zero copy arrays
2. Add examples of modifying a primitive array using zero-copy
conversion to/from Vec

# Are these changes tested?

By CI

# Are there any user-facing changes?

Docs only, no functional change

---------

Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
@Jefffrey Jefffrey added this pull request to the merge queue Nov 5, 2025
Merged via the queue into apache:main with commit b52a81d Nov 5, 2025
28 checks passed
@Jefffrey
Copy link
Contributor

Jefffrey commented Nov 5, 2025

Thanks @vegarsti

jizezhang pushed a commit to jizezhang/datafusion that referenced this pull request Nov 5, 2025
## Which issue does this PR close?
- Closes apache#18350.

## Rationale for this change
We want to be able to reverse a ListView.

## What changes are included in this PR?
- Downcast `&dyn Array` to `ListView`: `as_list_view_array`
- Downcast `&dyn Array` to `LargeListView`: `as_large_list_view_array`
- Branches in `array_reverse_inner` to reverse `ListView` and
`LargeListView`
- Main logic in `list_view_reverse` which materializes a new values
array using `take`

## Are these changes tested?
Yes
@vegarsti vegarsti deleted the list-view-reverse branch November 5, 2025 07:20
github-merge-queue bot pushed a commit that referenced this pull request Nov 5, 2025
There's no benchmarks for `array_reverse`. I used this while working on
#18424 to confirm `take` was faster than MutableData for ListView. That
might be the case for other List types as well, which are currently
using `MutableData`.

The benchmark can be run with `cargo bench --bench array_reverse`.
github-merge-queue bot pushed a commit that referenced this pull request Nov 9, 2025
## Rationale for this change

Noticed while doing #18424 that the list types `List` and
`FixedSizeList` uses `MutableData` to build the reverse array. Using
`take` turns out to be a lot faster, ~70% for both `List` and
`FixedSizeList`. This PR also reworks the benchmark added in #18425, and
these are the results on that compared to the implementation on main:

```
# cargo bench --bench array_reverse
   Compiling datafusion-functions-nested v50.3.0 (/Users/vegard/dev/datafusion/datafusion/functions-nested)
    Finished `bench` profile [optimized] target(s) in 42.08s
     Running benches/array_reverse.rs (target/release/deps/array_reverse-2c473eed34a53d0a)
Gnuplot not found, using plotters backend
Benchmarking array_reverse_list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, or reduce sample count to 70.
array_reverse_list      time:   [62.201 ms 62.551 ms 62.946 ms]
                        change: [−70.137% −69.965% −69.785%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking array_reverse_list_view: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, or reduce sample count to 70.
array_reverse_list_view time:   [61.649 ms 61.905 ms 62.185 ms]
                        change: [−16.122% −15.623% −15.087%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

array_reverse_fixed_size_list
                        time:   [4.7936 ms 4.8292 ms 4.8741 ms]
                        change: [−76.435% −76.196% −75.951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  8 (8.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe
```

## Are these changes tested?
Covered by existing sqllogic tests, and one new test for
`FixedSizeList`.
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When reviewing apache#18424 I noticed some refactoring that could be applied
to existing array reverse implementation.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

See my comments for the refactors & justifications.

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
## Which issue does this PR close?
- Closes apache#18350.

## Rationale for this change
We want to be able to reverse a ListView.

## What changes are included in this PR?
- Downcast `&dyn Array` to `ListView`: `as_list_view_array`
- Downcast `&dyn Array` to `LargeListView`: `as_large_list_view_array`
- Branches in `array_reverse_inner` to reverse `ListView` and
`LargeListView`
- Main logic in `list_view_reverse` which materializes a new values
array using `take`

## Are these changes tested?
Yes
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
There's no benchmarks for `array_reverse`. I used this while working on
apache#18424 to confirm `take` was faster than MutableData for ListView. That
might be the case for other List types as well, which are currently
using `MutableData`.

The benchmark can be run with `cargo bench --bench array_reverse`.
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
## Rationale for this change

Noticed while doing apache#18424 that the list types `List` and
`FixedSizeList` uses `MutableData` to build the reverse array. Using
`take` turns out to be a lot faster, ~70% for both `List` and
`FixedSizeList`. This PR also reworks the benchmark added in apache#18425, and
these are the results on that compared to the implementation on main:

```
# cargo bench --bench array_reverse
   Compiling datafusion-functions-nested v50.3.0 (/Users/vegard/dev/datafusion/datafusion/functions-nested)
    Finished `bench` profile [optimized] target(s) in 42.08s
     Running benches/array_reverse.rs (target/release/deps/array_reverse-2c473eed34a53d0a)
Gnuplot not found, using plotters backend
Benchmarking array_reverse_list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, or reduce sample count to 70.
array_reverse_list      time:   [62.201 ms 62.551 ms 62.946 ms]
                        change: [−70.137% −69.965% −69.785%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking array_reverse_list_view: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, or reduce sample count to 70.
array_reverse_list_view time:   [61.649 ms 61.905 ms 62.185 ms]
                        change: [−16.122% −15.623% −15.087%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

array_reverse_fixed_size_list
                        time:   [4.7936 ms 4.8292 ms 4.8741 ms]
                        change: [−76.435% −76.196% −75.951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  8 (8.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe
```

## Are these changes tested?
Covered by existing sqllogic tests, and one new test for
`FixedSizeList`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support reverse for ListView

3 participants