Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner, executor: refactor expand logic from analyzing grouping sets to level projections #7169

Merged
merged 18 commits into from
Jul 7, 2023

Conversation

AilinKid
Copy link
Contributor

@AilinKid AilinKid commented Mar 27, 2023

What problem does this PR solve?

Issue Number: close #6852

Problem Summary:

What is changed and how it works?

In previous PR #6545, we introduced Expand operator to analyze raw grouping-sets pb, and generating the implicit projection logic (basically expand and change the value in the block itself)

While for the better implementation of ROLL UP syntax, which means more dynamic generated columns and more complicated grouping sets definition, we choose to move the analyzing work to TiDB planner, and let the execution logic in tiflash side as clean and simple as possible.

In this PR, TiFlash should receive the level-projections pb structure from TiDB planner, and what it only need to concern is to just project it out (one level projection for outputting a individual block itself)

In the new implementation, we take expand as projection, so it will be also seen as a datasource node in query block division. Additionally, we introduced a ExpandBlockInputStream to control the block cache and current projected index position which is being done the same in ExpandTransformAction.

the new pb structure will be like below:

message ProjectExpr {
	repeated Expr exprs = 1;
}

message Expand2{
	repeated ProjectExpr proj_exprs = 1;         // for grouping sets generated projection levels, like [a, b, c, 1#gen_col1, 2#gen_col2]
	repeated string generated_output_names = 2;  // for output names for generated cols like grouping_id etc, like "gen_col1", "gen_col2" here
	optional Executor child = 3;
}

TiPB link: pingcap/tipb#300

Note: for now, since TiDB still relay on the old Expand logic (but not public to user yet, the default switch is off now), after we refactored the old logic in Multi Distinct Aggregate from expand to expand2, I will remove all the old logic completely without concern about the compatibility in the rolling update.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

planner, executor: refactor expand logic from analyzing grouping sets to level projections

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 27, 2023
@sre-bot
Copy link
Collaborator

sre-bot commented Mar 27, 2023

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2023
@SeaRise SeaRise self-requested a review March 30, 2023 07:14
Comment on lines 34 to 37
Block ExpandBlockInputStream::getHeader() const
{
Block res(names_and_types);
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnPtr is required for the Block returned by getHeader.
ColumnPtr is not needed only for getSampleBlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like

Block getHeader() const override
{
return Block(columns_vector[0]).cloneEmpty();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed, 2f2f325

@@ -149,6 +149,14 @@ ExpressionAction ExpressionAction::expandSource(GroupingSets grouping_sets_)
return a;
}

ExpressionAction ExpressionAction::addNullable(const std::string & col_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be rename this as buildConvertToNullableAction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    static ExpressionAction addColumn(const ColumnWithTypeAndName & added_column_);
    static ExpressionAction removeColumn(const std::string & removed_name);
    static ExpressionAction copyColumn(const std::string & from_name, const std::string & to_name);
    static ExpressionAction project(const NamesWithAliases & projected_columns_);
    static ExpressionAction project(const Names & projected_columns_);
    static ExpressionAction ordinaryJoin(std::shared_ptr<const Join> join_, const NamesAndTypesList & columns_added_by_join_);
    static ExpressionAction expandSource(GroupingSets grouping_sets);

this one:
    static ExpressionAction convertToNullable(const std::string & col_name);

how about convertToNullable?maybe more regular here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    static ExpressionAction addColumn(const ColumnWithTypeAndName & added_column_);
    static ExpressionAction removeColumn(const std::string & removed_name);
    static ExpressionAction copyColumn(const std::string & from_name, const std::string & to_name);
    static ExpressionAction project(const NamesWithAliases & projected_columns_);
    static ExpressionAction project(const Names & projected_columns_);
    static ExpressionAction ordinaryJoin(std::shared_ptr<const Join> join_, const NamesAndTypesList & columns_added_by_join_);
    static ExpressionAction expandSource(GroupingSets grouping_sets);

this one:
    static ExpressionAction convertToNullable(const std::string & col_name);

how about convertToNullable?maybe more regular here

okk

MutableColumns res(num_columns);
for (size_t i = 0; i < num_columns; ++i)
{
res[i] = block_cache.getColumns()[i]->cloneResized(block_cache.rows());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the columns that must be null, I think we should not clone their full content,

Copy link
Contributor Author

@AilinKid AilinKid Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.
we should dig out which col in this level is projected as null (varies from levels), then, do something elimination of copy work here. For now, multi_level_actions is seen as black box here, no too much analyzing work, just use level_action.execute(block) to get more simplicity

Copy link
Contributor

@xzhangxian1008 xzhangxian1008 Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. we should dig out which col in this level is projected as null (varies from levels), then, do something elimination of copy work here. For now, multi_level_actions is seen as black box here, no too much analyzing work, just use level_action.execute(block) to get more simplicity

projection action will add null attribute on some columns in L46. I think we can record which columns are null before and after executing. By comparing the difference we can know which columns are the columns that is projected to null and we can avoid the cloning on these columns. How about this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add todo for further performance optimization.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2023
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Apr 24, 2023
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2023
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Apr 24, 2023
Comment on lines +53 to +59
const std::string & name = one_alias.first;
const std::string & alias = one_alias.second;
ColumnWithTypeAndName column = cloned_block.getByName(name);
if (!alias.empty())
column.name = alias;
new_block.insert(std::move(column));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can defer the operation of cloning the original columns' content. We could know which columns are needed at here, and unnecessary cloning mentioned at the previous comments could be avoided, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the case is:
clone a brand-new origin block, apply the projection actions on them, we don't take too much role into what the projection actions do in detail (and at last, pick the result columns and rename)

we have already got beforeExpandActions, whch is quite like prependProjectionActions to prune those unneeded columns as a whole, then doing each level projection individually.

suggestions indicates we may need a fine-granted copyFilter before each level projection to find what column-ref in this level projection real need (for example, literal null didn't need a real col ref in current level projection), then copy those necessary source columns.

good suggestion, a bit complicated, I will left it to next pull request as an enhancement.

@xzhangxian1008
Copy link
Contributor

Any update?

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2023
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2023
@xzhangxian1008
Copy link
Contributor

Any update?

, log(Logger::get(req_id))
{
children.push_back(input);
i_th_project = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init i_th_project in init list

Copy link
Contributor Author

@AilinKid AilinKid Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, when pipeline fetch a new block from child below Expand OP, we should cache this new mother block and rewind the current projection offset for the convenience of later level projections (each time output a new projected block in function tryOutput(...))

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guo-shaoge, xzhangxian1008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jul 6, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 6, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-07-04 01:48:24.298382691 +0000 UTC m=+82136.232016110: ☑️ agreed by xzhangxian1008.
  • 2023-07-06 05:55:30.341690612 +0000 UTC m=+1009.223166555: ☑️ agreed by guo-shaoge.

@AilinKid
Copy link
Contributor Author

AilinKid commented Jul 6, 2023

/merge

@AilinKid
Copy link
Contributor Author

AilinKid commented Jul 6, 2023

/run-all-tests

AilinKid added 17 commits July 7, 2023 16:02
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
.
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
.
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
Signed-off-by: AilinKid <ailinsilence4@gmail.com>
@AilinKid
Copy link
Contributor Author

AilinKid commented Jul 7, 2023

/run-all-tests

@AilinKid
Copy link
Contributor Author

AilinKid commented Jul 7, 2023

/rebuild

@AilinKid
Copy link
Contributor Author

AilinKid commented Jul 7, 2023

/run-all-tests

@AilinKid
Copy link
Contributor Author

AilinKid commented Jul 7, 2023

/merge

@ti-chi-bot ti-chi-bot bot merged commit bab9685 into pingcap:master Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize count distinct to 3 stages of distributed agg
6 participants