perf: Fix NLJ slow join with condition `array_has` #18161

2010YOUY01 · 2025-10-20T04:54:13Z

Which issue does this PR close?

Closes Datafusion 50 Performance Regression (array_has style filter/join for Parquet data set) #18070

Rationale for this change

See the above issue and its comment #18070 (comment)

What changes are included in this PR?

In nested loop join, when the join column includes List(Utf8View), use take() instead of to_array_of_size() to avoid deep copying the utf8 buffers inside Utf8View array.

This is the quick fix, avoiding deep copy inside to_array_of_size() is a bit tricky.
Here is ListArray's physical layout: https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.html
If multiple elements is pointing to the same list range, the underlying payload can't be reused.So the potential fix in to_array_of_size can only avoids copying the inner-inner utf8view array buffers, but can't avoid copying the inner array (i.e. views are still copied), and deep copying for other primitive types also can't be avoided. Seems this can be better solved when ListView type is ready 🤔

Benchmark

I tried query 1 in #18070, but only used 3 randomly sampled places parquet file.

49.0.0: 4s
50.0.0: stuck > 1 minute
PR: 4s

Now the performance are similar, I suspect the most time is spend evaluating the expensive array_has so the optimization in #16996 can't help much.

Are these changes tested?

Existing tests

Are there any user-facing changes?

No

alamb

I verified that with this fix the reproducer from @ianthetechie from this issue is fixed:

#18070

However, am not sure this code has test coverage. I checked via

nice cargo llvm-cov --html test --test sqllogictests

nice cargo llvm-cov --html test -p datafusion -p datafusion-physical-plan

And both show no coverage

I will look into adding some coverage

Now the performance are similar, I suspect the most time is spend evaluating the expensive array_has so the optimization in #16996 can't help much.

Yes, I looked at the array_has implementation and it is doing a lot of work. I will file a follow on ticket

Also, it seems to me that the fix / improvement for ScalarValue::to_array_of_size() is more general than just NLJ, so I will also file a ticket about that as well

alamb · 2025-10-20T13:59:58Z

datafusion/physical-plan/src/joins/nested_loop_join.rs

-            scalar_value.to_array_of_size(filtered_probe_batch.num_rows())?
+            // Avoid using `ScalarValue::to_array_of_size()` for `List(Utf8View)` to avoid
+            // deep copies for buffers inside `Utf8View` array. See below for details.
+            // https://github.com/apache/datafusion/issues/18159


The root cause is tracked in

Utf8View / BinaryView / StringViewArray::slice() and BinaryViewArray::slice() are slow (they allocate) arrow-rs#6408

alamb · 2025-10-20T14:00:37Z

datafusion/physical-plan/src/joins/nested_loop_join.rs

+                DataType::List(field) | DataType::LargeList(field)
+                    if field.data_type() == &DataType::Utf8View =>
+                {
+                    let indices_iter = std::iter::repeat_n(
+                        build_side_index as u64,
+                        filtered_probe_batch.num_rows(),
+                    );
+                    let indices_array = UInt64Array::from_iter_values(indices_iter);
+                    take(original_left_array.as_ref(), &indices_array, None)?
+                }


I think this approach could be ported into ScalarValue::to_array_of_size itself rather than special cased here -- which would improve performance in potentially other places

datafusion/datafusion/common/src/scalar/mod.rs

Lines 3238 to 3245 in 556eb9b

fn list_to_array_of_size(arr: &dyn Array, size: usize) -> Result<ArrayRef> {

let arrays = repeat_n(arr, size).collect::<Vec<_>>();

let ret = match !arrays.is_empty() {

true => arrow::compute::concat(arrays.as_slice())?,

false => arr.slice(0, 0),

};

Ok(ret)

}

That being said, I think this is a nice point fix that we can safely backport to the datafusion 50 branch, so I think we should merge this PR / backport it and I will file a follow on PR to further improve the code

alamb · 2025-10-20T14:31:41Z

THANK YOU very much for this fix and diagnosis @2010YOUY01

2010YOUY01 · 2025-10-20T14:49:27Z

THANK YOU very much for this fix and diagnosis @2010YOUY01

Thank you for the review. The feedback makes sense to me, but I can only address it tomorrow. If you're waiting on this patch to be included in the release, feel free to push changes directly to the PR.

alamb · 2025-10-20T15:00:11Z

Thank you @2010YOUY01

I just pushed a test that adds coverage for this case

I verified it is covered using

$ cargo llvm-cov test --html --profile=ci --test sqllogictests -- join_lists

alamb · 2025-10-20T15:00:31Z

datafusion/sqllogictest/test_files/join_lists.slt

@@ -0,0 +1,63 @@
+# Licensed to the Apache Software Foundation (ASF) under one


new test added here

alamb

Thank you @2010YOUY01

## Which issue does this PR close?  - Closes apache#18070 ## Rationale for this change  See the above issue and its comment apache#18070 (comment) ## What changes are included in this PR?  In nested loop join, when the join column includes `List(Utf8View)`, use `take()` instead of `to_array_of_size()` to avoid deep copying the utf8 buffers inside `Utf8View` array. This is the quick fix, avoiding deep copy inside `to_array_of_size()` is a bit tricky. Here is `ListArray`'s physical layout: https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.html If multiple elements is pointing to the same list range, the underlying payload can't be reused.So the potential fix in `to_array_of_size` can only avoids copying the inner-inner utf8view array buffers, but can't avoid copying the inner array (i.e. views are still copied), and deep copying for other primitive types also can't be avoided. Seems this can be better solved when `ListView` type is ready 🤔 ### Benchmark I tried query 1 in apache#18070, but only used 3 randomly sampled `places` parquet file. 49.0.0: 4s 50.0.0: stuck > 1 minute PR: 4s Now the performance are similar, I suspect the most time is spend evaluating the expensive `array_has` so the optimization in apache#16996 can't help much. ## Are these changes tested?  Existing tests ## Are there any user-facing changes?  No  --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb · 2025-10-20T15:29:41Z

Backport PR:

[branch-50] perf: Fix NLJ slow join with condition array_has (#18161) #18179

alamb · 2025-10-20T16:23:12Z

Filed a ticket to track making array_has faster:

Improve performance of array_has #18181

… (#18179) ## Which issue does this PR close?  - Related to #18070 - Part of #18072 ## Rationale for this change Fix performance regression in Datafusion 50 ## What changes are included in this PR? Backport #18161 to `branch-50` ## Are these changes tested? Yes ## Are there any user-facing changes? Fix performance regression Co-authored-by: Yongting You <2010youy01@gmail.com>

## Which issue does this PR close?  - Closes apache#18070 ## Rationale for this change  See the above issue and its comment apache#18070 (comment) ## What changes are included in this PR?  In nested loop join, when the join column includes `List(Utf8View)`, use `take()` instead of `to_array_of_size()` to avoid deep copying the utf8 buffers inside `Utf8View` array. This is the quick fix, avoiding deep copy inside `to_array_of_size()` is a bit tricky. Here is `ListArray`'s physical layout: https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.html If multiple elements is pointing to the same list range, the underlying payload can't be reused.So the potential fix in `to_array_of_size` can only avoids copying the inner-inner utf8view array buffers, but can't avoid copying the inner array (i.e. views are still copied), and deep copying for other primitive types also can't be avoided. Seems this can be better solved when `ListView` type is ready 🤔 ### Benchmark I tried query 1 in apache#18070, but only used 3 randomly sampled `places` parquet file. 49.0.0: 4s 50.0.0: stuck > 1 minute PR: 4s Now the performance are similar, I suspect the most time is spend evaluating the expensive `array_has` so the optimization in apache#16996 can't help much. ## Are these changes tested?  Existing tests ## Are there any user-facing changes?  No  --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

fix nlj slow list join

e3894fe

github-actions bot added the physical-plan Changes to the physical-plan crate label Oct 20, 2025

2010YOUY01 mentioned this pull request Oct 20, 2025

Datafusion 50 Performance Regression (array_has style filter/join for Parquet data set) #18070

Closed

alamb reviewed Oct 20, 2025

View reviewed changes

Add slt test coverage

860c47e

This was referenced Oct 20, 2025

Improve performance of ScalarValue::to_array_of_size() for Lists #18177

Closed

Avoid deep copying Utf8View array buffers in ScalarValue::to_array_of_size() #18159

Open

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 20, 2025

alamb reviewed Oct 20, 2025

View reviewed changes

alamb approved these changes Oct 20, 2025

View reviewed changes

alamb added this pull request to the merge queue Oct 20, 2025

Merged via the queue into apache:main with commit 7f75e58 Oct 20, 2025
32 checks passed

alamb mentioned this pull request Oct 20, 2025

[branch-50] perf: Fix NLJ slow join with condition array_has (#18161) #18179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Fix NLJ slow join with condition `array_has` #18161

perf: Fix NLJ slow join with condition `array_has` #18161

Uh oh!

2010YOUY01 commented Oct 20, 2025 •

edited by alamb

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb Oct 20, 2025

Uh oh!

alamb Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

2010YOUY01 commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb Oct 20, 2025

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	fn list_to_array_of_size(arr: &dyn Array, size: usize) -> Result<ArrayRef> {
	let arrays = repeat_n(arr, size).collect::<Vec<_>>();
	let ret = match !arrays.is_empty() {
	true => arrow::compute::concat(arrays.as_slice())?,
	false => arr.slice(0, 0),
	};
	Ok(ret)
	}

		@@ -0,0 +1,63 @@
		# Licensed to the Apache Software Foundation (ASF) under one

perf: Fix NLJ slow join with condition array_has #18161

perf: Fix NLJ slow join with condition array_has #18161

Uh oh!

Conversation

2010YOUY01 commented Oct 20, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Benchmark

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

2010YOUY01 commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: Fix NLJ slow join with condition `array_has` #18161

perf: Fix NLJ slow join with condition `array_has` #18161

2010YOUY01 commented Oct 20, 2025 •

edited by alamb

Loading