Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor/inspect: refactor current-load diagnosis rule to node-check (#15860) #17660

Merged
merged 1 commit into from
Jun 4, 2020

Conversation

sre-bot
Copy link
Contributor

@sre-bot sre-bot commented Jun 4, 2020

cherry-pick #15860 to release-4.0


What problem does this PR solve?

Before this PR, TiDB has a current-load diagnosis rule, but it can only diagnose for the current time load.

What is changed and how it works?

This PR change current-load to node-load diagnose rule, since it uses the metrics data, so it can diagnose at any time.

This diagnose rule depends on below metric tables:

  • node_load1
  • node_load5
  • node_load15
  • node_virtual_cpus
  • node_memory_usage
  • node_memory_swap_used
  • node_disk_usage

eg:

>select /*+ time_range("2020-03-30 16:22:04", "2020-03-30 16:50:04") */ * from inspection_result where rule='node-load'
+-----------+----------------------+------+-------------------+-------+-----------+----------+-------------------------------------------------------+
| RULE      | ITEM                 | TYPE | INSTANCE          | VALUE | REFERENCE | SEVERITY | DETAILS                                               |
+-----------+----------------------+------+-------------------+-------+-----------+----------+-------------------------------------------------------+
| node-load | cpu-load1            | node | 172.16.5.40:19110 | 41.1  | < 28.0    | warning  | cpu-load1 should less than (cpu_logical_cores * 0.7)  |
| node-load | cpu-load15           | node | 172.16.5.40:19110 | 36.2  | < 28.0    | warning  | cpu-load15 should less than (cpu_logical_cores * 0.7) |
| node-load | cpu-load5            | node | 172.16.5.40:19110 | 39.5  | < 28.0    | warning  | cpu-load5 should less than (cpu_logical_cores * 0.7)  |
| node-load | disk-usage           | node | 172.16.5.40:19110 | 92.4% | < 70%     | warning  | the disk-usage of /dev/sda3 is too high               |
| node-load | disk-usage           | node | 172.16.5.40:19110 | 95.6% | < 70%     | warning  | the disk-usage of /dev/nvme0n1 is too high            |
| node-load | disk-usage           | node | 172.16.5.40:19110 | 99.8% | < 70%     | warning  | the disk-usage of /dev/sda1 is too high               |
| node-load | swap-memory-used     | node | 172.16.5.40:19110 | 586.0 | 0         | warning  |                                                       |
| node-load | virtual-memory-usage | node | 172.16.5.40:19110 | 83.9% | < 70%     | warning  | the memory-usage of 172.16.5.40:19110 is too high     |
+-----------+----------------------+------+-------------------+-------+-----------+----------+-------------------------------------------------------+

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Side effects

Release note

  • refactor current-load diagnosis rule to node-check

Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor Author

sre-bot commented Jun 4, 2020

/run-all-tests

@crazycs520
Copy link
Contributor

/run-all-tests

Copy link
Contributor

@lonng lonng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@AilinKid AilinKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bb7133 bb7133 merged commit 4566942 into pingcap:release-4.0 Jun 4, 2020
@bb7133 bb7133 modified the milestones: v4.0.1, v4.0.2 Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants