Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: record sqls and heap profile when memory usage is higher than 80% system memory. (#18858) #20473

Merged
merged 8 commits into from
Nov 11, 2020

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #18858 to release-4.0


What problem does this PR solve?

Issue Number: close #17095

Problem Summary:
When tidb-server is killed by system oom-killer, we have no information to investigate which SQL is suspect that consumes unexpected memory and causes process down.

What is changed and how it works?

Proposal: xxx

What's Changed:

  1. Add a config Performance.ServerMemoryAlert to set the threshold manually.
  2. If Performance.ServerMemoryQuota is set, use instance memory usage and ServerMemoryQuota * 80% to check oom risk.
  3. If Performance.ServerMemoryQuota is not set, use system memory usage and total memory * 80% to check oom risk.

How it Works:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

  • Record SQLs and heap profile when memUsage is more than 80% of memQuota

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor Author

/run-all-tests

@XuHuaiyu
Copy link
Contributor

@wshwsh12 conflicts need to be resolved

@wshwsh12
Copy link
Contributor

/run-all-tests

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 19, 2020
lzmhhh123
lzmhhh123 previously approved these changes Oct 19, 2020
Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 19, 2020
@lzmhhh123
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 19, 2020
@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 20487

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@ti-srebot merge failed.

@wshwsh12
Copy link
Contributor

Maybe need cherry-pick this pr .... #17532

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

record.isServerMemoryQuotaSet = true
} else {
// TODO: Get the memory info in container directly.
record.serverMemoryQuota, record.err = memory.MemTotal()
Copy link
Member

@breezewish breezewish Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about container environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the only way to use it is to set server-memory-quota config.

@zz-jason
Copy link
Member

/merge

@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 20772
  • 20682
  • 20753

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot merged commit 60bd3d7 into pingcap:release-4.0 Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/config component/executor sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/4.0-cherry-pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants