Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make performance of TPCH q15 stable #4570

Merged
merged 4 commits into from
Apr 19, 2022
Merged

Conversation

windtalker
Copy link
Contributor

@windtalker windtalker commented Apr 2, 2022

Signed-off-by: xufei xufeixw@mail.ustc.edu.cn

What problem does this PR solve?

Issue Number: close #4451

Problem Summary:

What is changed and how it works?

Actually, there are 2 possible solutions

  • Consider local delta memory usage when estimate the result_size_bytes in aggregator on each execute of executeOnBlock
  • In Aggregator::mergeAndConvertToBlocks, check the result_size_bytes, and if it exceeds the threshold, convert all the hash table into two-level hash table

For the first solution, converting hash table to two-level hash table can be done by each threads in the first stage of ParallelAggregating, and for the second solution, this converting things are executed in 1 threads.

I've done some test for both solutions, and found first solution has ~20% perfomance gain compared to the second solution for TPCH q15. So I choose the first solution.

Why not enable two-level hash table by default for all the cases:

  • not all the agg method support two-level hash table
  • test results shows that always use two-level hash table is slower if the table data is small, and the size of agg results is small.

Testing query:
select count(*),id from test group by id;
test has 65536 rows, all rows have the same id, that is to say the above query return only 1 row
Testing cluster has only 1 TiFlash, and the cpu has 36 core, each query is running 1000 times.

currency = 1 currency = 5
two level hash table 22s 40s
normal hash table 16s 26s

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Apr 2, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • SchrodingerZhu
  • fzhedu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 2, 2022
@ti-chi-bot ti-chi-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 2, 2022
@CLAassistant
Copy link

CLAassistant commented Apr 2, 2022

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added status/LGT1 Indicates that a PR has LGTM 1. needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.0 Type: Need cherry pick to release-6.0 and removed do-not-merge/needs-triage-completed labels Apr 2, 2022
@@ -226,6 +226,11 @@ void submitLocalDeltaMemory()
local_delta = 0;
}

Int64 getLocalDeltaMemory()
{
return local_delta;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_delta is thread-local, so different threads will see different deltas, is that what you expect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
current_memory_usage = current_memory_tracker->get();
auto updated_local_delta_memory = CurrentMemoryTracker::getLocalDeltaMemory();
auto local_delta_memory_diff = updated_local_delta_memory - local_delta_memory;
current_memory_usage += (local_memory_usage.fetch_add(local_delta_memory_diff) + local_delta_memory_diff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_delta_memory_diff is added twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, fetch_add returns the value before added, so need to add it again.

Copy link
Contributor

@fzhedu fzhedu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 19, 2022
@fzhedu
Copy link
Contributor

fzhedu commented Apr 19, 2022

/merge

@ti-chi-bot
Copy link
Member

@fzhedu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 0f8e8ab

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 19, 2022
@windtalker
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

@windtalker: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

@windtalker: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@sre-bot
Copy link
Collaborator

sre-bot commented Apr 19, 2022

Coverage for changed files

Filename                                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MemoryTracker.cpp                                     90                30    66.67%          14                 4    71.43%         139                53    61.87%          60                31    48.33%
Common/MemoryTracker.h                                       12                10    16.67%          12                10    16.67%          15                13    13.33%           0                 0         -
DataStreams/ParallelAggregatingBlockInputStream.cpp         109               109     0.00%          12                12     0.00%         161               161     0.00%          70                70     0.00%
DataStreams/ParallelAggregatingBlockInputStream.h             6                 6     0.00%           6                 6     0.00%          14                14     0.00%           0                 0         -
Interpreters/Aggregator.cpp                                2670              2670     0.00%          74                74     0.00%        1607              1607     0.00%        1202              1202     0.00%
Interpreters/Aggregator.h                                   965               965     0.00%          39                39     0.00%         193               193     0.00%         398               398     0.00%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                      3852              3790     1.61%         157               145     7.64%        2129              2041     4.13%        1730              1701     1.68%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
17132      9492             44.59%    192847  96265        50.08%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot merged commit feee96a into pingcap:master Apr 19, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Apr 19, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4707.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Apr 19, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4709.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Apr 19, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4710.

windtalker added a commit to ti-chi-bot/tiflash that referenced this pull request Apr 22, 2022
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
ti-chi-bot added a commit that referenced this pull request Apr 22, 2022
ti-chi-bot added a commit that referenced this pull request Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.0 Type: Need cherry pick to release-6.0 release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TPCH q15 performance regression after introduce local_delta in MemoryTracker
7 participants