Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiflash, metric: add alert for TiFlash down #6590

Merged
merged 4 commits into from
Oct 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ aliases: ['/docs/dev/alert-rules/','/docs/dev/reference/alert-rules/']

# TiDB Cluster Alert Rules

This document describes the alert rules for different components in a TiDB cluster, including the rule descriptions and solutions of the alert items in TiDB, TiKV, PD, TiDB Binlog, Node_exporter and Blackbox_exporter.
This document describes the alert rules for different components in a TiDB cluster, including the rule descriptions and solutions of the alert items in TiDB, TiKV, PD, TiFlash, TiDB Binlog, Node_exporter and Blackbox_exporter.

According to the severity level, alert rules are divided into three categories (from high to low): emergency-level, critical-level, and warning-level. This division of severity levels applies to all alert items of each component below.

Expand Down Expand Up @@ -781,6 +781,10 @@ This section gives the alert rules for the TiKV component.

The speed of splitting Regions is slower than the write speed. To alleviate this issue, you’d better update TiDB to a version that supports batch-split (>= 2.1.0-rc1). If it is not possible to update temporarily, you can use `pd-ctl operator add split-region <region_id> --policy=approximate` to manually split Regions.

## TiFlash alert rules

For the detailed descriptions of TiFlash alert rules, see [TiFlash Alert Rules](/tiflash/tiflash-alert-rules.md).

## TiDB Binlog alert rules

For the detailed descriptions of TiDB Binlog alert rules, see [TiDB Binlog monitoring document](/tidb-binlog/monitor-tidb-binlog-cluster.md#alert-rules).
Expand Down Expand Up @@ -954,6 +958,22 @@ This section gives the alert rules for the Blackbox_exporter TCP, ICMP, and HTTP
* Check whether the TiDB process exists.
* Check whether the network between the monitoring machine and the TiDB machine is normal.

#### `TiFlash_server_is_down`

* Alert rule:

`probe_success{group="tiflash"} == 0`

* Description:

Failure to probe the TiFlash service port.

* Solution:

* Check whether the machine that provides the TiFlash service is down.
* Check whether the TiFlash process exists.
* Check whether the network between the monitoring machine and the TiFlash machine is normal.

#### `Pump_server_is_down`

* Alert rule:
Expand Down
2 changes: 1 addition & 1 deletion tiflash/tiflash-alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This document introduces the alert rules of the TiFlash cluster.

- Solution:

It might be caused by the internal problems of the TiFlash TMT engine. Contact [TiFlash R&D](mailto:support@pingcap.com) for support.
It might be caused by the internal problems of the TiFlash storage engine. Contact [TiFlash R&D](mailto:support@pingcap.com) for support.

## `TiFlash_raft_read_index_duration`

Expand Down