-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner, executor: refactor expand logic from analyzing grouping sets to level projections #7169
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Block ExpandBlockInputStream::getHeader() const | ||
{ | ||
Block res(names_and_types); | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnPtr
is required for the Block returned by getHeader
.
ColumnPtr
is not needed only for getSampleBlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like
tiflash/dbms/src/DataStreams/MockExchangeReceiverInputStream.h
Lines 31 to 33 in 1dc2d1a
Block getHeader() const override | |
{ | |
return Block(columns_vector[0]).cloneEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed, 2f2f325
@@ -149,6 +149,14 @@ ExpressionAction ExpressionAction::expandSource(GroupingSets grouping_sets_) | |||
return a; | |||
} | |||
|
|||
ExpressionAction ExpressionAction::addNullable(const std::string & col_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be rename this as buildConvertToNullableAction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static ExpressionAction addColumn(const ColumnWithTypeAndName & added_column_);
static ExpressionAction removeColumn(const std::string & removed_name);
static ExpressionAction copyColumn(const std::string & from_name, const std::string & to_name);
static ExpressionAction project(const NamesWithAliases & projected_columns_);
static ExpressionAction project(const Names & projected_columns_);
static ExpressionAction ordinaryJoin(std::shared_ptr<const Join> join_, const NamesAndTypesList & columns_added_by_join_);
static ExpressionAction expandSource(GroupingSets grouping_sets);
this one:
static ExpressionAction convertToNullable(const std::string & col_name);
how about convertToNullable?maybe more regular here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static ExpressionAction addColumn(const ColumnWithTypeAndName & added_column_); static ExpressionAction removeColumn(const std::string & removed_name); static ExpressionAction copyColumn(const std::string & from_name, const std::string & to_name); static ExpressionAction project(const NamesWithAliases & projected_columns_); static ExpressionAction project(const Names & projected_columns_); static ExpressionAction ordinaryJoin(std::shared_ptr<const Join> join_, const NamesAndTypesList & columns_added_by_join_); static ExpressionAction expandSource(GroupingSets grouping_sets); this one: static ExpressionAction convertToNullable(const std::string & col_name);
how about convertToNullable?maybe more regular here
okk
MutableColumns res(num_columns); | ||
for (size_t i = 0; i < num_columns; ++i) | ||
{ | ||
res[i] = block_cache.getColumns()[i]->cloneResized(block_cache.rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the columns that must be null, I think we should not clone their full content,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense.
we should dig out which col in this level is projected as null (varies from levels), then, do something elimination of copy work here. For now, multi_level_actions is seen as black box here, no too much analyzing work, just use level_action.execute(block)
to get more simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. we should dig out which col in this level is projected as null (varies from levels), then, do something elimination of copy work here. For now, multi_level_actions is seen as black box here, no too much analyzing work, just use
level_action.execute(block)
to get more simplicity
projection action will add null attribute on some columns in L46. I think we can record which columns are null before and after executing. By comparing the difference we can know which columns are the columns that is projected to null and we can avoid the cloning on these columns. How about this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add todo for further performance optimization.
const std::string & name = one_alias.first; | ||
const std::string & alias = one_alias.second; | ||
ColumnWithTypeAndName column = cloned_block.getByName(name); | ||
if (!alias.empty()) | ||
column.name = alias; | ||
new_block.insert(std::move(column)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can defer the operation of cloning the original columns' content. We could know which columns are needed at here, and unnecessary cloning mentioned at the previous comments could be avoided, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now the case is:
clone a brand-new origin block, apply the projection actions on them, we don't take too much role into what the projection actions do in detail (and at last, pick the result columns and rename)
we have already got beforeExpandActions
, whch is quite like prependProjectionActions
to prune those unneeded columns as a whole, then doing each level projection individually.
suggestions indicates we may need a fine-granted copyFilter
before each level projection to find what column-ref in this level projection real need (for example, literal null didn't need a real col ref in current level projection), then copy those necessary source columns.
good suggestion, a bit complicated, I will left it to next pull request as an enhancement.
Any update? |
Any update? |
, log(Logger::get(req_id)) | ||
{ | ||
children.push_back(input); | ||
i_th_project = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init i_th_project
in init list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, when pipeline fetch a new block from child below Expand OP, we should cache this new mother block and rewind the current projection offset for the convenience of later level projections (each time output a new projected block in function tryOutput(...))
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: guo-shaoge, xzhangxian1008 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
/merge |
/run-all-tests |
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
/run-all-tests |
/rebuild |
/run-all-tests |
/merge |
What problem does this PR solve?
Issue Number: close #6852
Problem Summary:
What is changed and how it works?
In previous PR #6545, we introduced Expand operator to analyze raw grouping-sets pb, and generating the implicit projection logic (basically expand and change the value in the block itself)
While for the better implementation of ROLL UP syntax, which means more dynamic generated columns and more complicated grouping sets definition, we choose to move the analyzing work to TiDB planner, and let the execution logic in tiflash side as clean and simple as possible.
In this PR, TiFlash should receive the level-projections pb structure from TiDB planner, and what it only need to concern is to just project it out (one level projection for outputting a individual block itself)
In the new implementation, we take expand as projection, so it will be also seen as a datasource node in query block division. Additionally, we introduced a ExpandBlockInputStream to control the block cache and current projected index position which is being done the same in ExpandTransformAction.
the new pb structure will be like below:
TiPB link: pingcap/tipb#300
Note: for now, since TiDB still relay on the old Expand logic (but not public to user yet, the default switch is
off
now), after we refactored the old logic in Multi Distinct Aggregate fromexpand
toexpand2
, I will remove all the old logic completely without concern about the compatibility in the rolling update.Check List
Tests
Side effects
Documentation
Release note