-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](hive) use binary search to prune hive partitions #58877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 36473 ms |
TPC-DS: Total hot run time: 182268 ms |
ClickBench: Total hot run time: 27.37 s |
|
run buildall |
TPC-H: Total hot run time: 36315 ms |
TPC-DS: Total hot run time: 182033 ms |
ClickBench: Total hot run time: 27.63 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
| }); | ||
|
|
||
| return new SortedPartitionRanges<>(sortedRanges, defaultPartitions); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should extract code from NereidsSortedPartitionsCacheManager.loadCache and reuse the same utility function
TPC-H: Total hot run time: 36931 ms |
TPC-DS: Total hot run time: 181153 ms |
ClickBench: Total hot run time: 27.33 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
TPC-H: Total hot run time: 36566 ms |
TPC-DS: Total hot run time: 180827 ms |
ClickBench: Total hot run time: 27.41 s |
|
run buildall |
TPC-H: Total hot run time: 35130 ms |
TPC-DS: Total hot run time: 181501 ms |
ClickBench: Total hot run time: 27.64 s |
FE Regression Coverage ReportIncrement line coverage |
Followup #44586 Enable binary search partition pruning optimization for Hive external tables. This PR adds binary search partition pruning support for Hive tables by: - Adding `getSortedPartitionRanges()` method to `ExternalTable` base class - Maintaining sorted partition ranges directly in `HivePartitionValues` for cache lifecycle consistency - Overriding `getSortedPartitionRanges()` in `HMSExternalTable` to provide sorted ranges **Performance improvement (20000 partitions, 1000 queries):** - Binary search enabled: **4.548 seconds** - Binary search disabled: **12.849 seconds** - **~2.8x faster**
Followup apache#44586 Enable binary search partition pruning optimization for Hive external tables. This PR adds binary search partition pruning support for Hive tables by: - Adding `getSortedPartitionRanges()` method to `ExternalTable` base class - Maintaining sorted partition ranges directly in `HivePartitionValues` for cache lifecycle consistency - Overriding `getSortedPartitionRanges()` in `HMSExternalTable` to provide sorted ranges **Performance improvement (20000 partitions, 1000 queries):** - Binary search enabled: **4.548 seconds** - Binary search disabled: **12.849 seconds** - **~2.8x faster**
Followup #44586 Enable binary search partition pruning optimization for Hive external tables. This PR adds binary search partition pruning support for Hive tables by: - Adding `getSortedPartitionRanges()` method to `ExternalTable` base class - Maintaining sorted partition ranges directly in `HivePartitionValues` for cache lifecycle consistency - Overriding `getSortedPartitionRanges()` in `HMSExternalTable` to provide sorted ranges **Performance improvement (20000 partitions, 1000 queries):** - Binary search enabled: **4.548 seconds** - Binary search disabled: **12.849 seconds** - **~2.8x faster**
Followup apache#44586 Enable binary search partition pruning optimization for Hive external tables. This PR adds binary search partition pruning support for Hive tables by: - Adding `getSortedPartitionRanges()` method to `ExternalTable` base class - Maintaining sorted partition ranges directly in `HivePartitionValues` for cache lifecycle consistency - Overriding `getSortedPartitionRanges()` in `HMSExternalTable` to provide sorted ranges **Performance improvement (20000 partitions, 1000 queries):** - Binary search enabled: **4.548 seconds** - Binary search disabled: **12.849 seconds** - **~2.8x faster**
Followup #44586
Enable binary search partition pruning optimization for Hive external tables.
This PR adds binary search partition pruning support for Hive tables by:
getSortedPartitionRanges()method toExternalTablebase classHivePartitionValuesfor cache lifecycle consistencygetSortedPartitionRanges()inHMSExternalTableto provide sorted rangesPerformance improvement (20000 partitions, 1000 queries):