-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](Nereids) after partition prune, output rows of scan node only contains rows from selected partitions #36760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 40201 ms |
TPC-DS: Total hot run time: 173361 ms |
ClickBench: Total hot run time: 30.7 s |
|
run p0 |
|
run buildall |
TPC-H: Total hot run time: 39513 ms |
TPC-DS: Total hot run time: 172096 ms |
ClickBench: Total hot run time: 30.12 s |
|
run p0 |
|
run feut |
|
run buildall |
TPC-H: Total hot run time: 40097 ms |
TPC-DS: Total hot run time: 170427 ms |
ClickBench: Total hot run time: 30.82 s |
|
run buildall |
TPC-H: Total hot run time: 39906 ms |
TPC-DS: Total hot run time: 173587 ms |
ClickBench: Total hot run time: 30.72 s |
| * if this is mv, return selectedIndexId, o.w -1 | ||
| * @return -1 or selectedIndexId | ||
| */ | ||
| long getSelectedIndexIdForMV(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only for sync mv? if true, rename to ForSyncMv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for both
| public Set<Map.Entry<Expression, ColumnStatistic>> getExpressionColumnStatsEntries() { | ||
| return expressionToColumnStats.entrySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly returning a map is more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to guarantee that every entry is add by API StatististicBuilder.put(expr, columnStats). If the map is exposed, the guarantee may be broken.
| for (Slot slot : olapScan.getOutput()) { | ||
| if (isVisibleSlotReference(slot)) { | ||
| ColumnStatistic cache = getColumnStatistic(olapTable, slot.getName(), | ||
| olapScan.getSelectedIndexIdForMV()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSelectedIndexIdForMV() i think this is only used in statistic, so impl it in StatsCalculator.java, not in OlapScan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
analyze module also need this function
| double rowCount = getSelectedPartitionRowCount(olapScan); | ||
| // if partition row count is not available, fallback to table stats | ||
| if (rowCount > 0) { | ||
| List<String> selectedPartitionNames = new ArrayList<>(olapScan.getSelectedPartitionIds().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Lists.newArrayListWithExpectedSize() to avoid resize. because list will resize when filling rate more than a threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fixed size, no need to resize
|
run external |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…contains rows from selected partitions (#36760) 1. update rowcount if some partitions are pruned 2. refactor StatsCalcualtor for Scan
### What problem does this PR solve? Related PR: #36760 #57850 Problem Summary: Fix stats unknown when calc sync mv plan statistics For SQLs that are related to statistics, we should not collect or compute statistics. Previously this was determined by the `isInternal` flag, but `isInternal` is too broad: it covers not only statistics-related SQL but also SQL used to generate materialized view plans. Materialized view plan generation requires statistics, so we introduce a new flag `isPlanWithUnKnownColumnStats` to indicate connections that are used for statistics-only operations (treat column statistics as unknown).
### What problem does this PR solve? Related PR: #36760 #57850 Problem Summary: Fix stats unknown when calc sync mv plan statistics For SQLs that are related to statistics, we should not collect or compute statistics. Previously this was determined by the `isInternal` flag, but `isInternal` is too broad: it covers not only statistics-related SQL but also SQL used to generate materialized view plans. Materialized view plan generation requires statistics, so we introduce a new flag `isPlanWithUnKnownColumnStats` to indicate connections that are used for statistics-only operations (treat column statistics as unknown).
### What problem does this PR solve? Related PR: #36760 #57850 Problem Summary: Fix stats unknown when calc sync mv plan statistics For SQLs that are related to statistics, we should not collect or compute statistics. Previously this was determined by the `isInternal` flag, but `isInternal` is too broad: it covers not only statistics-related SQL but also SQL used to generate materialized view plans. Materialized view plan generation requires statistics, so we introduce a new flag `isPlanWithUnKnownColumnStats` to indicate connections that are used for statistics-only operations (treat column statistics as unknown).
…#58426) ### What problem does this PR solve? Related PR: apache#36760 apache#57850 Problem Summary: Fix stats unknown when calc sync mv plan statistics For SQLs that are related to statistics, we should not collect or compute statistics. Previously this was determined by the `isInternal` flag, but `isInternal` is too broad: it covers not only statistics-related SQL but also SQL used to generate materialized view plans. Materialized view plan generation requires statistics, so we introduce a new flag `isPlanWithUnKnownColumnStats` to indicate connections that are used for statistics-only operations (treat column statistics as unknown).
### What problem does this PR solve? Related PR: #36760 #57850 Problem Summary: Fix stats unknown when calc sync mv plan statistics For SQLs that are related to statistics, we should not collect or compute statistics. Previously this was determined by the `isInternal` flag, but `isInternal` is too broad: it covers not only statistics-related SQL but also SQL used to generate materialized view plans. Materialized view plan generation requires statistics, so we introduce a new flag `isPlanWithUnKnownColumnStats` to indicate connections that are used for statistics-only operations (treat column statistics as unknown).
…#58426) ### What problem does this PR solve? Related PR: apache#36760 apache#57850 Problem Summary: Fix stats unknown when calc sync mv plan statistics For SQLs that are related to statistics, we should not collect or compute statistics. Previously this was determined by the `isInternal` flag, but `isInternal` is too broad: it covers not only statistics-related SQL but also SQL used to generate materialized view plans. Materialized view plan generation requires statistics, so we introduce a new flag `isPlanWithUnKnownColumnStats` to indicate connections that are used for statistics-only operations (treat column statistics as unknown).
Uh oh!
There was an error while loading. Please reload this page.