Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use repartition in window functions to speed up #569

Merged
merged 2 commits into from
Jun 30, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Jun 16, 2021

Which issue does this PR close?

this pull request is based on #571 so review that first

Closes #435 as this will make it stale

Rationale for this change

benchmark:


window empty over, aggregate functions
                        time:   [7.6349 ms 7.6997 ms 7.7675 ms]
                        change: [-78.341% -78.072% -77.798%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

window empty over, built-in functions
                        time:   [6.5445 ms 6.5839 ms 6.6256 ms]
                        change: [-78.729% -78.298% -77.872%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking window order by, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 18.3s, or reduce sample count to 20.
window order by, aggregate functions
                        time:   [177.92 ms 179.66 ms 181.50 ms]
                        change: [-10.040% -8.7722% -7.3530%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window order by, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.8s, or reduce sample count to 20.
window order by, built-in functions
                        time:   [165.02 ms 166.82 ms 168.68 ms]
                        change: [-12.674% -11.085% -9.5879%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking window partition by, u64_wide, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 27.7s, or reduce sample count to 10.
window partition by, u64_wide, aggregate functions
                        time:   [268.22 ms 271.51 ms 274.78 ms]
                        change: [-8.0164% -6.5188% -5.0558%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking window partition by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.2s, or reduce sample count to 40.
window partition by, u64_narrow, aggregate functions
                        time:   [116.73 ms 118.42 ms 120.14 ms]
                        change: [-22.910% -21.502% -20.118%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking window partition by, u64_wide, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 24.2s, or reduce sample count to 20.
window partition by, u64_wide, built-in functions
                        time:   [229.51 ms 231.90 ms 234.33 ms]
                        change: [-7.9804% -6.7551% -5.5555%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking window partition by, u64_narrow, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.0s, or reduce sample count to 40.
window partition by, u64_narrow, built-in functions
                        time:   [105.30 ms 106.75 ms 108.24 ms]
                        change: [-23.288% -21.785% -20.243%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking window partition and order by, u64_wide, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 47.0s, or reduce sample count to 10.
window partition and order by, u64_wide, aggregate functions
                        time:   [441.09 ms 444.76 ms 448.45 ms]
                        change: [-12.907% -11.959% -10.947%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking window partition and order by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 57.2s, or reduce sample count to 10.
window partition and order by, u64_narrow, aggregate functions
                        time:   [563.99 ms 568.03 ms 572.04 ms]
                        change: [-15.042% -13.897% -12.764%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking window partition and order by, u64_wide, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 41.4s, or reduce sample count to 10.
window partition and order by, u64_wide, built-in functions
                        time:   [400.88 ms 403.92 ms 406.91 ms]
                        change: [-3.1220% -1.9172% -0.6254%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

Benchmarking window partition and order by, u64_narrow, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 34.2s, or reduce sample count to 10.
window partition and order by, u64_narrow, built-in functions
                        time:   [334.65 ms 337.16 ms 339.72 ms]
                        change: [-35.596% -33.771% -31.982%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

What changes are included in this PR?

Are there any user-facing changes?

@jimexist jimexist force-pushed the partition-by-repartition branch from 0ac4a3a to 1658606 Compare June 21, 2021 03:22
@jimexist

This comment has been minimized.

@jimexist jimexist force-pushed the partition-by-repartition branch 2 times, most recently from a61d9ee to bb71637 Compare June 21, 2021 13:13
@jimexist jimexist force-pushed the partition-by-repartition branch from bb71637 to 0f7ce2f Compare June 22, 2021 10:13
@jimexist jimexist changed the title WIP Use repartition in window functions Use repartition in window functions to speed up Jun 22, 2021
@jimexist jimexist marked this pull request as ready for review June 22, 2021 11:43
@Dandandan
Copy link
Contributor

Nice, almost 5 times improvement for some queries!

But do those have to do with the changes wrt sort or with regards to partition by?

}

fn required_child_distribution(&self) -> Distribution {
Distribution::SinglePartition
Distribution::UnspecifiedDistribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be correct with a window without any partition by clause?

In that case I think the required partitions should be 1, as the aggregate function can not be computed only over a part of the data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! this is fixed.

@jimexist jimexist force-pushed the partition-by-repartition branch 3 times, most recently from aebff92 to 809cd80 Compare June 23, 2021 00:54
@jimexist
Copy link
Member Author

Nice, almost 5 times improvement for some queries!

But do those have to do with the changes wrt sort or with regards to partition by?

I don't expect that much of an improvement. let me redo the benchmark after #573 is merged.

@alamb alamb added the datafusion Changes in the datafusion crate label Jun 23, 2021
@jimexist jimexist force-pushed the partition-by-repartition branch 2 times, most recently from a2c90a1 to 9f18613 Compare June 24, 2021 14:27
@jimexist
Copy link
Member Author

latest against the master

window empty over, aggregate functions
                        time:   [38.093 ms 38.813 ms 39.557 ms]
                        change: [+7.9976% +10.144% +12.365%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

window empty over, built-in functions
                        time:   [28.492 ms 29.010 ms 29.571 ms]
                        change: [-9.6876% -6.5061% -3.1331%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking window order by, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 18.2s, or reduce sample count to 20.
window order by, aggregate functions
                        time:   [180.14 ms 182.30 ms 184.63 ms]
                        change: [-3.2247% -1.5082% +0.1867%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

Benchmarking window order by, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.7s, or reduce sample count to 20.
window order by, built-in functions
                        time:   [164.02 ms 165.72 ms 167.55 ms]
                        change: [-3.0666% -1.6099% -0.2141%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking window partition by, u64_wide, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 26.9s, or reduce sample count to 10.
window partition by, u64_wide, aggregate functions
                        time:   [279.31 ms 283.97 ms 288.78 ms]
                        change: [+5.6067% +7.9223% +10.288%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window partition by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.2s, or reduce sample count to 30.
window partition by, u64_narrow, aggregate functions
                        time:   [119.82 ms 122.24 ms 124.91 ms]
                        change: [-10.403% -8.3916% -6.3109%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking window partition by, u64_wide, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 23.7s, or reduce sample count to 20.
window partition by, u64_wide, built-in functions
                        time:   [229.23 ms 232.16 ms 235.14 ms]
                        change: [+1.8582% +3.3754% +4.9304%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window partition by, u64_narrow, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.9s, or reduce sample count to 40.
window partition by, u64_narrow, built-in functions
                        time:   [109.41 ms 111.44 ms 113.55 ms]
                        change: [-9.5580% -7.6470% -5.5888%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking window partition and order by, u64_wide, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 46.0s, or reduce sample count to 10.
window partition and order by, u64_wide, aggregate functions
                        time:   [451.08 ms 454.83 ms 458.61 ms]
                        change: [-8.3071% -6.4767% -4.6734%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking window partition and order by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 60.7s, or reduce sample count to 10.
window partition and order by, u64_narrow, aggregate functions
                        time:   [564.17 ms 568.36 ms 572.64 ms]
                        change: [-17.391% -16.295% -15.200%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking window partition and order by, u64_wide, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 41.1s, or reduce sample count to 10.
window partition and order by, u64_wide, built-in functions
                        time:   [396.51 ms 401.62 ms 407.54 ms]
                        change: [-10.962% -8.9661% -6.8683%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking window partition and order by, u64_narrow, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 34.4s, or reduce sample count to 10.
window partition and order by, u64_narrow, built-in functions
                        time:   [331.09 ms 334.20 ms 337.40 ms]
                        change: [-16.892% -15.883% -14.890%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

@Dandandan turns out it's not that clear cut

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jimexist jimexist force-pushed the partition-by-repartition branch 2 times, most recently from 9af8b64 to 93d2a91 Compare June 28, 2021 09:21
@jimexist jimexist force-pushed the partition-by-repartition branch from 93d2a91 to 723e7bc Compare June 28, 2021 23:38
@alamb alamb merged commit fddab22 into apache:master Jun 30, 2021
@houqp houqp added the performance Make DataFusion faster label Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate performance Make DataFusion faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create a test with more than one partition for window functions
4 participants