Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto pass through hashagg #9167

Merged
merged 130 commits into from
Aug 7, 2024

Conversation

guo-shaoge
Copy link
Contributor

@guo-shaoge guo-shaoge commented Jun 27, 2024

What problem does this PR solve?

Issue Number: ref #9196

Problem Summary:

What is changed and how it works?

This PR mainly includes four modifications:

  1. AutoPassThroughHashAggContext: Includes the state control logic, with five states: Init, Adjust, Selective, PreHashAgg, and PassThrough. State transitions occur based on the pre-aggregation effect of each block.
  2. AutoPassThroughHashAggBlockInputStream/AutoPassThroughAggregateTransform: Executor logic that call the AutoPassThroughHashAggContext. It can support sending blocks in advance.
  3. Aggregator: support only_lookup and collect_hit_rate
  4. AutoPassThroughHashHelper: Help generate agg func column by original column.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

support auto pass through hashagg

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
2. auto_pass: support context

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
del only_lookup template for HashMethodBase

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 27, 2024
@guo-shaoge guo-shaoge changed the title Auto passthrough hashagg auto pass through hashagg Jun 27, 2024
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@ti-chi-bot ti-chi-bot bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 30, 2024
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@@ -312,10 +313,20 @@ String PhysicalPlan::toString() const
return PhysicalPlanVisitor::visitToString(root_node);
}

void recursiveSetBlockInputStreamParent(BlockInputStreamPtr self, const IBlockInputStream * parent)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need first check if parent is set, otherwise it may meet performance issues like #4494

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -12,12 +12,17 @@
// See the License for the specific language governing permissions and
// limitations under the License.

#include <AggregateFunctions/AggregateFunctionFactory.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this todo finished?

{
void AutoPassThroughHashAggContext::onBlockAuto(Block & block)
{
RUNTIME_CHECK_MSG(!already_start_to_get_data, "Shouldn't insert into HashMap if already start to get data");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregator->executoeOnBlock() could only aggregate part of the data in block, is this case considered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void AutoPassThroughHashAggContext::forceState()
{
if (many_data[0]->need_spill)
state = State::PassThrough;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the hash table be converted to block after check need_spill is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When need_spill is true, follow will happend:

  1. stop insert new data into hashtable
  2. will convert hashtable to block in advance(getDataInAdvance()) to help reduce memory.

#undef DISPATCH_NUMBER_INNER
#undef DISPATCH_NUMBER_OUTER
#undef FOR_NUMBER_TYPES_INNER
#undef FOR_NUMBER_TYPES_OUTER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean when user input is sum(long), TiDB will convert it to sum(cast(long to decimal))?

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
void PhysicalPlan::buildBlockInputStream(DAGPipeline & pipeline, Context & context, size_t max_streams)
{
RUNTIME_CHECK(root_node);
root_node->buildBlockInputStream(pipeline, context, max_streams);
pipeline.transform([](auto & stream) { recursiveSetBlockInputStreamParent(stream, nullptr); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need call this recursiveSetBlockInputStreamParent in BlockInputStreamPtr Planner::execute()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Aug 7, 2024
Copy link
Contributor

ti-chi-bot bot commented Aug 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: windtalker, yibin87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Aug 7, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-31 08:02:45.257739586 +0000 UTC m=+342881.537787657: ☑️ agreed by yibin87.
  • 2024-08-07 10:07:30.752590965 +0000 UTC m=+432980.619690053: ☑️ agreed by windtalker.

@ti-chi-bot ti-chi-bot bot merged commit 1289b43 into pingcap:master Aug 7, 2024
5 checks passed
@JaySon-Huang JaySon-Huang deleted the auto_passthrough_hashagg branch August 8, 2024 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants