Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hash_join_single_partition_threshold_rows config #8720

Merged
merged 2 commits into from
Jan 20, 2024

Conversation

maruschin
Copy link
Contributor

Which issue does this PR close?

Closes #8405.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jan 2, 2024
@maruschin maruschin force-pushed the collect_fix branch 2 times, most recently from 9c2af7b to d9c9d34 Compare January 2, 2024 12:54
@alamb alamb requested a review from Dandandan January 2, 2024 15:50
@alamb
Copy link
Contributor

alamb commented Jan 2, 2024

@korowa may also be interested in this

@Dandandan
Copy link
Contributor

Change looks good to me in general, could you also update the tests @maruschin so we can check the new behavior?

@maruschin maruschin force-pushed the collect_fix branch 5 times, most recently from ece73a9 to 42028a9 Compare January 9, 2024 14:43
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 9, 2024
@maruschin
Copy link
Contributor Author

Hi @Dandandan, I fixed the tests and refactored variables, making the names more explicit.

In general, the tests have not changed.
We apparently do not test the case when there is no statistics in bytes and there are statistics in rows.

@alamb
Copy link
Contributor

alamb commented Jan 18, 2024

@Dandandan as the author of #8405 would you possibly have time to review this PR again? If not I will find time to do so

@Dandandan
Copy link
Contributor

To me the change looks good, I think it makes sense to run the (TPC-H) benchmarks once more to catch regressions.

@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

To me the change looks good, I think it makes sense to run the (TPC-H) benchmarks once more to catch regressions.

I will do so

@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

I merged this branch with main and am running tests on it now

@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

Here is the result of the benchmarks when I ran them. My conclusion is that there is no measurable difference in performance due to this branch so it should be good to go

--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ collect_fix ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  327.85ms │    328.00ms │     no change │
│ QQuery 2     │   93.09ms │     94.13ms │     no change │
│ QQuery 3     │  172.44ms │    168.23ms │     no change │
│ QQuery 4     │  102.36ms │     98.83ms │     no change │
│ QQuery 5     │  199.92ms │    210.24ms │  1.05x slower │
│ QQuery 6     │   95.04ms │     97.14ms │     no change │
│ QQuery 7     │  269.17ms │    282.54ms │     no change │
│ QQuery 8     │  241.80ms │    225.48ms │ +1.07x faster │
│ QQuery 9     │  357.00ms │    286.67ms │ +1.25x faster │
│ QQuery 10    │  324.27ms │    278.82ms │ +1.16x faster │
│ QQuery 11    │   76.61ms │     79.25ms │     no change │
│ QQuery 12    │  155.77ms │    161.19ms │     no change │
│ QQuery 13    │  325.53ms │    346.48ms │  1.06x slower │
│ QQuery 14    │  127.94ms │    126.09ms │     no change │
│ QQuery 15    │  151.43ms │    167.27ms │  1.10x slower │
│ QQuery 16    │  103.59ms │    104.93ms │     no change │
│ QQuery 17    │  291.58ms │    306.44ms │  1.05x slower │
│ QQuery 18    │  469.92ms │    394.92ms │ +1.19x faster │
│ QQuery 19    │  201.67ms │    205.44ms │     no change │
│ QQuery 20    │  161.96ms │    159.41ms │     no change │
│ QQuery 21    │  323.94ms │    330.82ms │     no change │
│ QQuery 22    │   80.45ms │     82.82ms │     no change │
└──────────────┴───────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary          ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)     │ 4653.33ms │
│ Total Time (collect_fix)   │ 4535.15ms │
│ Average Time (main_base)   │  211.51ms │
│ Average Time (collect_fix) │  206.14ms │
│ Queries Faster             │         4 │
│ Queries Slower             │         4 │
│ Queries with No Change     │        14 │
└────────────────────────────┴───────────┘

--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ collect_fix ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  327.85ms │    328.00ms │     no change │
│ QQuery 2     │   93.09ms │     94.13ms │     no change │
│ QQuery 3     │  172.44ms │    168.23ms │     no change │
│ QQuery 4     │  102.36ms │     98.83ms │     no change │
│ QQuery 5     │  199.92ms │    210.24ms │  1.05x slower │
│ QQuery 6     │   95.04ms │     97.14ms │     no change │
│ QQuery 7     │  269.17ms │    282.54ms │     no change │
│ QQuery 8     │  241.80ms │    225.48ms │ +1.07x faster │
│ QQuery 9     │  357.00ms │    286.67ms │ +1.25x faster │
│ QQuery 10    │  324.27ms │    278.82ms │ +1.16x faster │
│ QQuery 11    │   76.61ms │     79.25ms │     no change │
│ QQuery 12    │  155.77ms │    161.19ms │     no change │
│ QQuery 13    │  325.53ms │    346.48ms │  1.06x slower │
│ QQuery 14    │  127.94ms │    126.09ms │     no change │
│ QQuery 15    │  151.43ms │    167.27ms │  1.10x slower │
│ QQuery 16    │  103.59ms │    104.93ms │     no change │
│ QQuery 17    │  291.58ms │    306.44ms │  1.05x slower │
│ QQuery 18    │  469.92ms │    394.92ms │ +1.19x faster │
│ QQuery 19    │  201.67ms │    205.44ms │     no change │
│ QQuery 20    │  161.96ms │    159.41ms │     no change │
│ QQuery 21    │  323.94ms │    330.82ms │     no change │
│ QQuery 22    │   80.45ms │     82.82ms │     no change │
└──────────────┴───────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary          ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)     │ 4653.33ms │
│ Total Time (collect_fix)   │ 4535.15ms │
│ Average Time (main_base)   │  211.51ms │
│ Average Time (collect_fix) │  206.14ms │
│ Queries Faster             │         4 │
│ Queries Slower             │         4 │
│ Queries with No Change     │        14 │
└────────────────────────────┴───────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks agian @maruschin

@alamb alamb merged commit f5a97d5 into apache:main Jan 20, 2024
23 checks passed
@Dandandan
Copy link
Contributor

Thanks for checking it @alamb and for the PR @maruschin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

collection_size_threshold incorrectly used in supports_collect_by_size
3 participants