Skip to content

Commit

Permalink
cherry pick pingcap#5291 to release-4.0
Browse files Browse the repository at this point in the history
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
  • Loading branch information
qiancai authored and ti-srebot committed Apr 16, 2021
1 parent 13b1806 commit 5371d66
Show file tree
Hide file tree
Showing 3 changed files with 302 additions and 0 deletions.
216 changes: 216 additions & 0 deletions tiup/tiup-component-cluster-check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
---
title: tiup cluster check
---

# tiup cluster check

For a formal production environment, before the environment goes live, you need to perform a series of checks to ensure the clusters are in their best performance. To simplify the manual check steps, TiUP Cluster provides the `check` command to check whether the hardware and software environments of the target machines of a specified cluster meet the requirements to work normally.

## List of check items

### Operating system version

Check the operating system distribution and version of the deployed machines. Currently, only CentOS 7 is supported for deployment. More system versions may be supported in later releases for compatibility improvement.

### CPU EPOLLEXCLUSIVE

Check whether the CPU of the target machine supports EPOLLEXCLUSIVE.

### numactl

Check whether numactl is installed on the target machine. If tied cores are configured on the target machine, you must install numactl.

### System time

Check whether the system time of the target machine is synchronized. Compare the system time of the target machine with that of the central control machine, and report an error if the deviation exceeds a certain threshold (500ms).

### Time synchronization service

Check whether the time synchronization service is configured on the target machine. Namely, check whether ntpd is running.

### Swap partitioning

Check whether swap partitioning is enabled on the target machine. It is recommended to disable swap partitioning.

### Kernel parameters

Check the values of the following kernel parameters:

- `net.ipv4.tcp_tw_recycle`: 0
- `net.ipv4.tcp_syncookies`: 0
- `net.core.somaxconn`: 32768
- `vm.swappiness`: 0
- `vm.overcommit_memory`: 0 or 1
- `fs.file-max`: 1000000

### Transparent Huge Pages (THP)

Check whether THP is enabled on the target machine. It is recommended to disable THP.

### System limits

Check the limit values in the `/etc/security/limits.conf` file:

```
<deploy-user> soft nofile 1000000
<deploy-user> hard nofile 1000000
<deploy-user> soft stack 10240
```

`<deploy-user>` is the user who deploys and runs the TiDB cluster, and the last column is the minimum value required for the system.

### SELinux

Check whether SELinux is enabled. It is recommended to disable SELinux.

### Firewall

Check whether the FirewallD service is enabled. It is recommended to either disable the FirewallD service or add permission rules for each service in the TiDB cluster.

### irqbalance

Check whether the irqbalance service is enabled. It is recommended to enable the irqbalance service.

### Disk mount options

Check the mount options for ext4 partitions. Make sure the mount options include the nodelalloc option and the noatime option.

### Port usage

Check if the ports defined in the topology (including the auto-completion default ports) are already used by the processes on the target machine.

> **Note:**
>
> The port usage check assumes that a cluster is not started yet. If a cluster is already deployed and started, the port usage check on the cluster fails because the ports must be in use in this case.
### CPU core number

Check the CPU information of the target machine. For a production cluster, it is recommended that the number of the CPU logical core is greater than or equal to 16.

> **Note:**
>
> CPU core number is not checked by default. To enable the check, you need to add the `-enable-cpu` option to the command.
### Memory size

Check the memory size of the target machine. For a production cluster, it is recommended that the total memory capacity is greater than or equal to 32GB.

> **Note:**
>
> Memory size is not checked by default. To enable the check, you need to add the `-enable-mem` option to the command.
### Fio disk performance test

Use flexible I/O tester (fio) to test the performance of the disk where `data_dir` is located, including the following three test items:

- fio_randread_write_latency
- fio_randread_write
- fio_randread

> **Note:**
>
> The fio disk performance test is not performed by default. To perform the test, you need to add the `-enable-disk` option to the command.
## Syntax

```shell
tiup cluster check <topology.yml | cluster-name> [flags]
```

- If a cluster is not deployed yet, you need to pass the topology.yml <!--[topology.yml](/tiup/tiup-cluster-topology-reference.md)--> file that is used to deploy the cluster. According to the content in this file, tiup-cluster connects to the corresponding machine to perform the check.
- If a cluster is already deployed, you can use the `<cluster-name>` as the check object.

> **Note:**
>
> If `<cluster-name>` is used for the check, you need to add the `--cluster` option in the command.
## Options

### --apply

- Attempts to automatically repair the failed check items. Currently, tiup-cluster only attempts to repair the following check items:
- SELinux
- firewall
- irqbalance
- kernel parameters
- System limits
- THP (Transparent Huge Pages)
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.

### --cluster

- Indicates that the check is for the deployed clusters.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.

> **Note:**
>
> tiup-cluster supports checking both un-deployed clusters and deployed clusters with the following command format:
>
> ```shell
> tiup cluster check <topology.yml | cluster-name> [flags]
> ```
>
> If the `tiup cluster check <cluster-name>` command is used, you must add the `--cluster` option: `tiup cluster check <cluster-name> --cluster`.
### --enable-cpu
- Enables the check of CPU core number.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.
### --enable-disk
- Enables the fio disk performance test.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.
### --enable-mem
- Enables the memory size check.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.
### --u, --user
- Specifies the user name to connect to the target machine. The specified user needs to have the password-free sudo root privileges on the target machine.
- Data type: `STRING`
- If this option is not specified in the command, the user who executes the command is used as the default value.
> **Note:**
>
> This option is valid only if the `-cluster` option is false. Otherwise, the value of this option is fixed to the username specified in the topology file for the cluster deployment.
### -i, --identity_file
- Specifies the key file to connect to the target machine.
- Data type: `STRING`
- The option is enabled by default with `~/.ssh/id_rsa` (the default value) passed in.
> **Note:**
>
> This option is valid only if the `--cluster` option is false. Otherwise, the value of this option is fixed to `${TIUP_HOME}/storage/cluster/clusters/<cluster-name>/ssh/id_rsa`.
### -p, --password
- Logs in with a password when connecting to the target machine.
- If the `--cluster` option is added for a cluster, the password is the password of the user specified in the topology file when the cluster was deployed.
- If the `--cluster` option is not added for a cluster, the password is the password of the user specified in the `-u/--user` option.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.
### -h, --help
- Prints the help information of the related commands.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.
## Output
A table containing the following fields:
- `Node`: the target node
- `Check`: the check item
- `Result`: the check result (Pass, Warn, or Fail)
- `Message`: the result description
58 changes: 58 additions & 0 deletions tiup/tiup-component-dm-display.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: tiup dm display
---

# tiup dm display

If you want to check the operational status of each component in a DM cluster, it is inefficient to log in to each machine one by one. Therefore, tiup-dm provides the `tiup dm display` command to do this job efficiently.

## Syntax

```shell
tiup dm display <cluster-name> [flags]
```

`<cluster-name>` is the name of the cluster to be operated. If you forget the cluster name, you can use the `[tiup dm list](/tiup/tiup-component-dm-list.md)` command to check it.

## Options

### -N, --node

- Specifies the IDs of the nodes to query, splitting by commas for multiple nodes. If you are not sure about the ID of a node, you can skip this option in the command to show the IDs and status of all nodes in the output.
- Data type: `STRING`
- This option is enabled by default with `[]` (which means all nodes) passed in.

> **Note:**
>
> If `-R, --role` is also specified, only the services in the intersection of the specified nodes and roles is queried.
### -R, --role

- Specifies the roles to query, splitting by commas for multiple roles. If you are not sure about the role deployed on a node, you can skip this option in the command to show the roles and status of all nodes in the output.
- Data type: `STRING`
- This option is enabled by default with `[]` (which means all roles) passed in.

> **Note:**
>
> If `-N, --node` is also specified, only the services in the intersection of the specified nodes and roles is queried.
### -h, --help

- Prints the help information.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.

## Output

- Cluster name
- Cluster version
- SSH client type
- A table containing the following fields:
- `ID`: the node ID, consisting of IP:PORT.
- `Role`: the service role deployed on the node (for example, TiDB or TiKV).
- `Host`: the IP address of the machine corresponding to the node.
- `Ports`: the port number used by the service.
- `OS/Arch`: the operating system and machine architecture of the node.
- `Status`: the current status of the services on the node.
- `Data Dir`: the data directory of the service. `-` means that there is no data directory.
- `Deploy Dir`: the deployment directory of the service.
28 changes: 28 additions & 0 deletions tiup/tiup-component-dm-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: tiup dm upgrade
---

# tiup dm upgrade

The `tiup dm upgrade` command is used to upgrade a specified cluster to a specific version.

## Syntax

```shell
tiup dm upgrade <cluster-name> <version> [flags]
```

- `<cluster-name>` is the name of the cluster to be operated on. If you forget the cluster name, you can use the `[tiup dm list](/tiup/tiup-component-dm-list.md)` command to check it.
- `<version>` is the target version to be upgraded to. Currently, only upgrading to a later version is allowed, and upgrading to an earlier version is not allowed, which means the downgrade is not allowed. Upgrading to a nightly version is not allowed either.

## Options

### -h, --help

- Prints the help information.
- Data type: `BOOLEAN`
- This option is disabled by default with the `false` value. To enable this option, add this option to the command, and either pass the `true` value or do not pass any value.

## Output

Log of the service upgrade process.

0 comments on commit 5371d66

Please sign in to comment.