Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine merge agg stream #6793

Merged
merged 18 commits into from
Feb 14, 2023
Merged

Conversation

SeaRise
Copy link
Contributor

@SeaRise SeaRise commented Feb 9, 2023

What problem does this PR solve?

Issue Number: ref #5900

Problem Summary:

What is changed and how it works?

  • Remove the constraint on two level merge: If the aggregation states are two-level, then it produces blocks strictly in order of 'bucket_num'., because tiflash does not need this.
  • Move most of one level and two level merge's logic from MergingAndConvertingBlockInputStream to MergingBuckets and don't create separate threads to execute two level merge.
  • Now MergingAndConvertingBlockInputStream only run in one thread, and use UnionBlockInputStream to do parallel two level merge.
  • Move MergingBuckets to Aggregator.h because of the c++ compile error...
  • Remove some useless code in Aggregator.h and Aggregator.cpp and remove useless class MergingAggregatedBlockInputStream

Check List

Tests

tsan unit tests pass.

[----------] 1 test from SpillAggregationTestRunner (301567 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (301570 ms total)
[  PASSED  ] 1 test.
[----------] 9 tests from AggExecutorTestRunner (58719 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test case ran. (58722 ms total)
[  PASSED  ] 9 tests.
  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 9, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • windtalker
  • ywqzzy

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 9, 2023
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 9, 2023
@SeaRise SeaRise changed the title WIP: Refine merge agg stream Refine merge agg stream Feb 9, 2023
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2023
@ti-chi-bot ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 9, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 9, 2023

/run-all-tests

@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 9, 2023

/run-unit-test

1 similar comment
@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 9, 2023

/run-unit-test

@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 9, 2023

tsan unit tests pass.

[----------] 1 test from SpillAggregationTestRunner (301567 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (301570 ms total)
[  PASSED  ] 1 test.
[----------] 9 tests from AggExecutorTestRunner (58719 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 1 test case ran. (58722 ms total)
[  PASSED  ] 9 tests.

@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 9, 2023

/run-all-tests

1 similar comment
@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 9, 2023

/run-all-tests

@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 10, 2023

/cc @windtalker @ywqzzy

throw Exception("Unknown aggregated data variant.", ErrorCodes::UNKNOWN_AGGREGATED_DATA_VARIANT);
}
#undef M
single_level_blocks = aggregator.prepareBlocksAndFillSingleLevel(*first, final);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move the merge code to L2201, together with the without_key case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, updated.

const LoggerPtr log;
const Aggregator & aggregator;
ManyAggregatedDataVariants data;
bool final;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it since it conflicts with the “final” keyword

Copy link
Contributor Author

@SeaRise SeaRise Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but I don't want to rename it in this pr..
Since final is used in many places such Aggregator, AggregatingBlockInputStream, and MergingAggregatedMemoryEfficientBlockInputStream and etc, it might be better to do it with another pr..


void thread(Int32 bucket_num)
{
try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to handle the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnionBlockInputStream or other outside blockInputStreams will handle it.

BlockInputStreams merging_streams;
for (size_t i = 0; i < merging_buckets->getConcurrency(); ++i)
merging_streams.push_back(std::make_shared<MergingAndConvertingBlockInputStream>(merging_buckets, i, log->identifier()));
impl = std::make_unique<UnionBlockInputStream<>>(merging_streams, BlockInputStreams{}, max_threads, log->identifier());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving the unionBlockInputStream construction outside the readImpl function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible because Impl needs to be constructed inside readImpl.

@ywqzzy ywqzzy self-requested a review February 13, 2023 07:12
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM

{
APPLY_FOR_VARIANTS_TWO_LEVEL(M)
default:
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw error for the default branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 14, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 14, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented Feb 14, 2023

/merge

@ti-chi-bot
Copy link
Member

@SeaRise: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 4d694bb

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 14, 2023
@ti-chi-bot ti-chi-bot merged commit 34fc70f into pingcap:master Feb 14, 2023
@SeaRise SeaRise deleted the refine_merge_agg_stream branch February 14, 2023 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants