-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Sort Merge Join: Reduce batch concatenation, use BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~4000x faster)
#18875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…hes on vector of RecordBatches. Add benchmarks, update tests.
|
|
|
I have a bug somewhere the extended tests demonstrate. I'll try to track it down next week. |
# Conflicts: # Cargo.lock
|
I think I sorted out the corner case failures by refactoring a bit. I basically removed direct member access to |
BatchCoaleser in Sort Merge Join, new benchmarksBatchCoaleser in Sort Merge Join, new benchmarks (TPC-H Q21 SMJ 1000x faster)
BatchCoaleser in Sort Merge Join, new benchmarks (TPC-H Q21 SMJ 1000x faster)BatchCoaleser in Sort Merge Join, new benchmarks (TPC-H Q21 SMJ ~1000x faster)
BatchCoaleser in Sort Merge Join, new benchmarks (TPC-H Q21 SMJ ~1000x faster)BatchCoalescer, new benchmarks (TPC-H Q21 SMJ ~1000x faster)
|
Probably we can also test it with #18985 once it is merged |
|
🤖 |
|
I started the following on this branch I think that will effectively test the merge join performance of main with this branch |
|
This is what I get on my amd ryzen 9 machine: |
BatchCoalescer, new benchmarks (TPC-H Q21 SMJ ~1000x faster)BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~1000x faster)
|
FWIW the benchmarks are still running because Q21 took over an hour to run 🤯 |
|
some of the |
I might remove some. They were mostly to help me understand control flow as I was learning the SMJ state machine: I'd try to codify my understanding with |
|
🤖: Benchmark completed Details
|
My goodness. |
|
Small batches are evil, sorry for delay, I wanted to check the PR with TPCDS but because of recent regression #19075 cannot merge it right now |
|
could you please align with main, I just merged a PR that fixed bug in SMJ and updated fuzz tests |
BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~1000x faster)BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~4000x faster)
Which issue does this PR close?
Rationale for this change
DataFusion Comet often uses Sort Merge Joins because DataFusion does not have a larger-than-memory Hash Join operator. Performance on TPC-H Q21 is quite bad when run through native, and instead Comet falls back to Spark by default. If you force Comet to use DataFusion's SMJ operator, performance is:
Profiling showed most of the time spent in
concat_batchesof single-digit rows:What changes are included in this PR?
Use a
BatchCoalescerboth internally and to buffer final output. There was also some redundant concatenation of batches for filtered joins. One made the biggest difference, but I switched to two to be consistent. Here are Comet results with the changes based on 50.3 (which is where Comet is):TPC-H SF1 benchmark results are below (
PREFER_HASH_JOIN=false ./bench.sh run tpch). I tried to run SF10 TPC-H but it seemed like it was going to take hours on my machine. It ran successfully on this PR.Are these changes tested?
Existing Sort Merge Join unit tests, added a new benchmark.
Are there any user-facing changes?
There should not be.