-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to specify a custom range for the range filter when using parallel replicas with dynamic shards #64604
Merged
antonio2368
merged 14 commits into
ClickHouse:master
from
ClibMouse:parallel_replicas_custom_key_range_min_max_setting
Jun 7, 2024
Merged
Allow users to specify a custom range for the range filter when using parallel replicas with dynamic shards #64604
antonio2368
merged 14 commits into
ClickHouse:master
from
ClibMouse:parallel_replicas_custom_key_range_min_max_setting
Jun 7, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alexey-milovidov
added
the
can be tested
Allows running workflows for external contributors
label
May 29, 2024
robot-ch-test-poll3
added
the
pr-improvement
Pull request with some product improvements
label
May 29, 2024
Contributor
This is an automated comment for commit 2129401 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
antonio2368
reviewed
Jun 3, 2024
antonio2368
requested changes
Jun 4, 2024
tests/queries/0_stateless/03164_parallel_replicas_range_filter_min_max.sql
Outdated
Show resolved
Hide resolved
tests/queries/0_stateless/03164_parallel_replicas_range_filter_min_max.sql
Outdated
Show resolved
Hide resolved
antonio2368
reviewed
Jun 5, 2024
josh-hildred
force-pushed
the
parallel_replicas_custom_key_range_min_max_setting
branch
from
June 6, 2024 12:25
8162ee4
to
b5815ec
Compare
with range filter to use a custom range
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
josh-hildred
force-pushed
the
parallel_replicas_custom_key_range_min_max_setting
branch
from
June 6, 2024 12:28
b5815ec
to
76db904
Compare
antonio2368
approved these changes
Jun 7, 2024
Thank you for working with me on this one @antonio2368! |
Merged
via the queue into
ClickHouse:master
with commit Jun 7, 2024
b1d6c73
240 of 247 checks passed
@josh-hildred thanks for the contribution, please continue pushing good PRs 🙂 |
robot-ch-test-poll
added
the
pr-synced-to-cloud
The PR is synced to the cloud repo
label
Jun 7, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
can be tested
Allows running workflows for external contributors
pr-improvement
Pull request with some product improvements
pr-synced-to-cloud
The PR is synced to the cloud repo
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Allow users to specify a custom range for the range filter when using parallel replicas with dynamic shards. This change is particular useful when the custom key expression involves the primary key and relevant values of the primary key are known to be uniformly distributed over some range. For example, if a query is of the form
WHERE k >= k1 and k <= k2
and hasparallel_replicas_custom_key = 'k'
withparallel_replicas_custom_key_filter_type = 'range'
by settingparallel_replicas_custom_key_range_lower=k1
andparallel_replicas_custom_key_range_upper=k2
, we are able to better parallelize this query over the replicas. This is because when an upper boundk1
and lower boundk2
are provided, we can split processing based on the range[k1, k2]
rather than[0, INT_MAX]
(I.E the custom key only needs to be uniformly distributed over the range[k1, k2]
rather than on[0, INT_MAX]
). Furthermore, ifk
is the primary key we also get better primary index usage leading to less "double loading" data at replicas.Note: This setting will not cause any additional data to be filtered during query processing, rather it changes the points at which the range filter breaks up the range
[0, INT_MAX]
when parallelizing the query.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add settings
parallel_replicas_custom_key_range_lower
andparallel_replicas_custom_key_range_upper
to control how parallel replicas with dynamic shards parallelizes queries when using a range filter.Documentation entry for user-facing changes
CI Settings
NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step