Support to use Schedular in tpch benchmark #4361

xudong963 · 2022-11-24T16:34:00Z

Which issue does this PR close?

No

Rationale for this change

Support to test Schedular(the newest executor designed by @tustvold ) performance in tpch benchmark

What changes are included in this PR?

Support to use Schedular in tpch benchmark

Are these changes tested?

No need, I have checked in my local.

➜  benchmarks git:(master) ✗ cargo run --release --bin tpch -- benchmark datafusion --iterations 1 --path ~/parquet_data --format parquet --enable-scheduler --query 1
    Finished release [optimized] target(s) in 0.13s
     Running `/arrow-datafusion/target/release/tpch benchmark datafusion --iterations 1 --path /Users/xudong/parquet_data --format parquet --enable-scheduler --query 1`
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(1), debug: false, iterations: 1, partitions: 2, batch_size: 8192, path: "/Users/xudong/parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: false, enable_scheduler: true }
Query 1 iteration 0 took 3847.1 ms and returned 4 rows
Query 1 avg time: 3847.15 ms

Are there any user-facing changes?

No

xudong963 · 2022-11-24T16:36:05Z

benchmarks/src/bin/tpch.rs

@@ -337,7 +344,13 @@ async fn execute_query(
        );
    }
    let task_ctx = ctx.task_ctx();
-    let result = collect(physical_plan.clone(), task_ctx).await?;
+    let result = if enable_scheduler {
+        let scheduler = Scheduler::new(num_cpus::get());


I use num_cpus to initial the size of thread pool.

isidentical · 2022-11-24T21:08:21Z

This is a cool option! Just out of my own curiosity I've compared --enable-scheduler vs not to see if there are any clear wins and there are a few queries where the scheduler outperforms by %10. Obviously, there are still some others with major degradations but with this option, it should be much easier to identify those cases and see the underlying cause.

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     Baseline ┃   Comparison ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │     643.67ms │     592.49ms │ +1.09x faster │
│ Q2           │     291.85ms │     262.98ms │ +1.11x faster │
│ Q3           │     373.14ms │    1028.82ms │  2.76x slower │
│ Q4           │     103.54ms │     105.93ms │     no change │
│ Q5           │     363.02ms │     714.48ms │  1.97x slower │
│ Q6           │     370.04ms │     374.17ms │     no change │
│ Q7           │     745.62ms │    1316.95ms │  1.77x slower │
│ Q8           │     471.89ms │     867.03ms │  1.84x slower │
│ Q9           │     651.69ms │     998.41ms │  1.53x slower │
│ Q10          │     432.42ms │    1165.39ms │  2.70x slower │
│ Q11          │     204.11ms │     173.80ms │ +1.17x faster │
│ Q12          │     186.74ms │     194.05ms │     no change │
│ Q13          │     404.37ms │     379.72ms │ +1.06x faster │
│ Q14          │     337.40ms │     413.87ms │  1.23x slower │
│ Q15          │     325.42ms │     624.71ms │  1.92x slower │
│ Q16          │     153.15ms │     155.41ms │     no change │
│ Q17          │    1931.96ms │    2051.97ms │  1.06x slower │
│ Q18          │    1349.79ms │    1280.70ms │ +1.05x faster │
│ Q19          │     621.30ms │     683.34ms │  1.10x slower │
│ Q20          │     609.66ms │     642.90ms │  1.05x slower │
│ Q21          │     628.37ms │     710.87ms │  1.13x slower │
│ Q22          │      65.89ms │      65.55ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘

xudong963 · 2022-11-25T01:37:24Z

@isidentical It's expected.

There is a tracking issue: #2504

I add the Option because I plan to iterate our new executor with @tustvold, the option will help check some iterations quickly.

tustvold

Really cool to see this, I really hope to eventually get back to working on this scheduler - currently it can't really do all that much better than tokio as we first need to add proper push-based execution - see here.

One thought might be to test the scheduler based benchmark in CI, but don't feel very strongly

Perhaps when I finish up integrating the row format everywhere I'll have some more time to work on this 😄

xudong963 · 2022-11-25T09:33:13Z

Perhaps when I finish up integrating the row format everywhere I'll have some more time to work on this 😄

I'll start to do it after work and look forward to seeing you back!

ursabot · 2022-11-25T12:22:08Z

Benchmark runs are scheduled for baseline = 58b43f5 and contender = 010aded. 010aded is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

xudong963 requested review from tustvold and alamb November 24, 2022 16:34

xudong963 commented Nov 24, 2022

View reviewed changes

xudong963 force-pushed the enable_schedular_for_tpch_benchmark branch from 274eb55 to a3c010d Compare November 24, 2022 16:37

Support to use Schedular in tpch benchmark

52bd409

xudong963 force-pushed the enable_schedular_for_tpch_benchmark branch from a3c010d to 52bd409 Compare November 24, 2022 16:55

fix toml fmt

d7c27e9

tustvold approved these changes Nov 25, 2022

View reviewed changes

Merge branch 'master' into enable_schedular_for_tpch_benchmark

925a21a

xudong963 merged commit 010aded into apache:master Nov 25, 2022

xudong963 deleted the enable_schedular_for_tpch_benchmark branch November 25, 2022 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to use Schedular in tpch benchmark #4361

Support to use Schedular in tpch benchmark #4361

xudong963 commented Nov 24, 2022

xudong963 Nov 24, 2022

isidentical commented Nov 24, 2022

xudong963 commented Nov 25, 2022

tustvold left a comment

xudong963 commented Nov 25, 2022

ursabot commented Nov 25, 2022

Support to use Schedular in tpch benchmark #4361

Support to use Schedular in tpch benchmark #4361

Conversation

xudong963 commented Nov 24, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

xudong963 Nov 24, 2022

Choose a reason for hiding this comment

isidentical commented Nov 24, 2022

xudong963 commented Nov 25, 2022

tustvold left a comment

Choose a reason for hiding this comment

xudong963 commented Nov 25, 2022

ursabot commented Nov 25, 2022