Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): improve cardinality estimate #11394

Merged
merged 7 commits into from
May 19, 2023

Conversation

Dousir9
Copy link
Member

@Dousir9 Dousir9 commented May 10, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

1. Adjust ndv

  • if the data type is int and max - min + 1 < ndv, we will adjust ndv to max - min + 1.
  • if the data type is string, we will convert it to a 128-base integer and calculate the distance between min and max. Note that we only consider the case when the length of the string is less than or equal to 4, because there are 128 characters in ASCII code and 128^4 = 268435456 < 2^32 < 128^5.
  • if the data type is boolean, the ndv is either 1 or 2.

2. Improve histogram bucket

3. Update statistic by selectivity

4. Return the updated column_stats upwards

5. Improve join cardinality computation

6. enable enable_dphyp by default

For TPC-H Q2, Q3, Q5, Q8, Q10, Q18, Q21 after this PR, we will get better join order.

Closes #issue

@vercel
Copy link

vercel bot commented May 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 18, 2023 8:07am

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 10, 2023
@Dousir9 Dousir9 marked this pull request as ready for review May 16, 2023 11:28
@Dousir9 Dousir9 requested review from xudong963 and leiysky May 16, 2023 11:30
@Dousir9 Dousir9 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels May 18, 2023
@BohuTANG
Copy link
Member

Great.

But there is a tpcds test broken:

[Diff] (-expected|+actual)
    GA F M 2 1 2 2 2.0 0 1 0 0 0.0 0 1 0 0 0.0
    GA F U 0 1 0 0 0.0 0 1 0 0 0.0 0 1 0 0 0.0
    GA M D 0 1 0 0 0.0 0 1 0 0 0.0 0 1 0 0 0.0
    IN F D 3 1 3 3 3.0 0 1 0 0 0.0 0 1 0 0 0.0
    IN F S 2 1 2 2 2.0 0 1 0 0 0.0 0 1 0 0 0.0
    IN F W 3 1 3 3 3.0 0 1 0 0 0.0 0 1 0 0 0.0
-   KS F M 1 1 1 1 1.0 0 1 0 0 0.0 0 1 0 0 0.0
-   MI M S 3 1 3 3 3.0 0 1 0 0 0.0 0 1 0 0 0.0
    NV F S 0 1 0 0 0.0 0 1 0 0 0.0 0 1 0 0 0.0
    TX F D 1 1 1 1 1.0 0 1 0 0 0.0 0 1 0 0 0.0
-   WI F U 1 1 1 1 1.0 0 1 0 0 0.0 0 1 0 0 0.0
+   WI F U 1 2 1 1 1.0 0 2 0 0 0.0 0 2 0 0 0.0
    WV M M 2 1 2 2 2.0 0 1 0 0 0.0 0 1 0 0 0.0
at tests/sqllogictests/suites/tpcds/queries.test:4915

@Dousir9 Dousir9 removed the ci-benchmark Benchmark: run all test label May 18, 2023
@databendlabs databendlabs deleted a comment from github-actions bot May 18, 2023
@Dousir9 Dousir9 added the ci-benchmark Benchmark: run all test label May 18, 2023
@databendlabs databendlabs deleted a comment from github-actions bot May 18, 2023
@Dousir9 Dousir9 removed the ci-benchmark Benchmark: run all test label May 18, 2023
@Dousir9 Dousir9 force-pushed the improve_cardinality_estimate branch from c88b9b6 to e698a4a Compare May 19, 2023 01:34
@BohuTANG BohuTANG merged commit ece4437 into databendlabs:main May 19, 2023
@databendlabs databendlabs deleted a comment from github-actions bot May 26, 2023
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* improve cardinality estimate

* update other statistic by selectivity

* dphyp

* fix dphyp subquery

* remove tpcds q64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants