Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: add an OOMAction for copIterator to adaptive control the memory usage #19246

Merged
merged 68 commits into from
Sep 22, 2020

Conversation

Yisaer
Copy link
Contributor

@Yisaer Yisaer commented Aug 17, 2020

What problem does this PR solve?

Currently TableReader will consume too much memory and have no oom action.

Issue Number: close #xxx
ref #16104

What is changed and how it works?

Add one oom action for the coprocessor in non keep-order case. As copIterator in non keepOrder case have already supported sendRate. The oom action strategy is to reduce the sendrate ticket and drain out copResponse in response channel while the copWroker have been suspended when the oom action have been triggered.

If the ticket for the sendrate have been reduced to 1 , the oom action will delegate to the fallback action.

Here is the memory status for backuping 200G data with v4.0.2 PD and TIKV

image

logs during backup

[root@172.16.4.4 gaosong]# kubectl logs -c tidb -n gaosong basic-tidb-0 -f | grep taskRateLimit
[2020/08/21 05:06:31.047 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1179570309] [quota=1073741824] [maxConsumed=1179570309] [tearedTicket=0] [ticketTotal=15]
[2020/08/21 05:06:47.791 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:06:49.720 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083889923] [quota=1073741824] [maxConsumed=2480231489] [tearedTicket=1] [ticketTotal=15]
[2020/08/21 05:07:06.971 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:07:08.933 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083926344] [quota=1073741824] [maxConsumed=2480231489] [tearedTicket=2] [ticketTotal=15]
[2020/08/21 05:07:27.230 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:07:29.378 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083885830] [quota=1073741824] [maxConsumed=2480231489] [tearedTicket=3] [ticketTotal=15]
[2020/08/21 05:07:48.401 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:07:50.292 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083942369] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=4] [ticketTotal=15]
[2020/08/21 05:08:08.136 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:08:10.177 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083918969] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=5] [ticketTotal=15]
[2020/08/21 05:08:27.572 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:08:29.945 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083918330] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=6] [ticketTotal=15]
[2020/08/21 05:08:48.998 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:08:51.700 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083917546] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=7] [ticketTotal=15]
[2020/08/21 05:09:11.398 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:09:14.120 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083855143] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=8] [ticketTotal=15]
[2020/08/21 05:09:31.630 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:09:34.553 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083889738] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=9] [ticketTotal=15]
[2020/08/21 05:09:54.937 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:09:57.860 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083885191] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=10] [ticketTotal=15]
[2020/08/21 05:10:15.279 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:10:19.209 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083882653] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=11] [ticketTotal=15]
[2020/08/21 05:10:38.468 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:10:43.895 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083872351] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=12] [ticketTotal=15]
[2020/08/21 05:11:04.014 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]
[2020/08/21 05:11:14.353 +00:00] [INFO] [coprocessor.go:1275] ["memory exceeds quota, mark taskRateLimitAction exceed signal."] [consumed=1083853584] [quota=1073741824] [maxConsumed=2492900197] [tearedTicket=13] [ticketTotal=15]
[2020/08/21 05:11:34.720 +00:00] [INFO] [coprocessor.go:693] ["taskRateLimitAction Broadcast"]

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test

Release note

  • No release note

@Yisaer Yisaer requested a review from a team as a code owner August 17, 2020 11:28
@Yisaer Yisaer requested review from lzmhhh123 and removed request for a team August 17, 2020 11:28
@Yisaer Yisaer marked this pull request as draft August 17, 2020 11:28
@github-actions github-actions bot added the sig/execution SIG execution label Aug 20, 2020
@Yisaer Yisaer changed the title [DNM] executor: Add coprocessor oom action executor: Add coprocessor oom action Aug 21, 2020
@Yisaer Yisaer marked this pull request as ready for review August 21, 2020 06:16
@XuHuaiyu XuHuaiyu requested review from fzhedu and wshwsh12 August 24, 2020 02:51
@zz-jason
Copy link
Member

@Yisaer can we reduce the log size?

@Yisaer
Copy link
Contributor Author

Yisaer commented Aug 24, 2020

@Yisaer can we reduce the log size?

updated, the unnecessary log param is removed now.

store/tikv/coprocessor.go Show resolved Hide resolved
store/tikv/coprocessor.go Outdated Show resolved Hide resolved
@Yisaer Yisaer requested a review from wshwsh12 August 31, 2020 04:14
@Yisaer
Copy link
Contributor Author

Yisaer commented Sep 1, 2020

/run-unit-test

@XuHuaiyu
Copy link
Contributor

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 21, 2020
@XuHuaiyu
Copy link
Contributor

/run-all-tests

@XuHuaiyu
Copy link
Contributor

/run-all-tests

Copy link
Contributor

@wshwsh12 wshwsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM

store/tikv/coprocessor.go Outdated Show resolved Hide resolved
store/tikv/coprocessor.go Outdated Show resolved Hide resolved
store/tikv/coprocessor.go Outdated Show resolved Hide resolved
store/tikv/coprocessor.go Outdated Show resolved Hide resolved
@wshwsh12
Copy link
Contributor

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Sep 22, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 22, 2020
@wshwsh12
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Sep 22, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@Yisaer merge failed.

@XuHuaiyu
Copy link
Contributor

/run-tics-test
/run-unit-test

@wshwsh12 wshwsh12 merged commit 205f021 into pingcap:master Sep 22, 2020
@zz-jason
Copy link
Member

@Yisaer Could you prepare a detailed test report for this PR? We may need to consider these parts:

  • What's the performance influence for normal queries?
  • What's the average speed for dump different size data, for example, 100 GB, 200 GB, 500 GB, 1 TB?
  • What's the average time it takes to shrink memory usage to satisfy the memory quota for a query?
  • When will this optimization fail to prevent OOM? Would it fail if the system available memory and the memory quota for a query are both 1 GB? If it is, what's the expected memory quota configuration for a specific system memory?

@Yisaer
Copy link
Contributor Author

Yisaer commented Oct 9, 2020

@Yisaer Could you prepare a detailed test report for this PR? We may need to consider these parts:

  • What's the performance influence for normal queries?
  • What's the average speed for dump different size data, for example, 100 GB, 200 GB, 500 GB, 1 TB?
  • What's the average time it takes to shrink memory usage to satisfy the memory quota for a query?
  • When will this optimization fail to prevent OOM? Would it fail if the system available memory and the memory quota for a query are both 1 GB? If it is, what's the expected memory quota configuration for a specific system memory?

got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge-program sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants