From 254877d6f70ded98278e6fa40daae402945eaedc Mon Sep 17 00:00:00 2001 From: Aolin Date: Wed, 15 Nov 2023 16:58:45 +0800 Subject: [PATCH 1/3] upgrade using tiup: add FAQ for concurrent DDL Signed-off-by: Aolin --- upgrade-tidb-using-tiup.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index 3f3fda8b386bf..232e3dcb6b55f 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -24,6 +24,7 @@ This document is targeted for the following upgrade paths: > **Note:** > > - If your cluster to be upgraded is v3.1 or an earlier version (v3.0 or v2.1), the direct upgrade to v7.4.0 is not supported. You need to upgrade your cluster first to v4.0 and then to v7.4.0. +> - If your cluster to be upgraded is earlier than v6.2, you might encounter the issue that the upgrade gets stuck when upgrading to v6.2 or later versions. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). > - TiDB nodes use the value of the [`server-version`](/tidb-configuration-file.md#server-version) configuration item to verify the current TiDB version. Therefore, to avoid unexpected behaviors, before upgrading the TiDB cluster, you need to set the value of `server-version` to empty or the real version of the current TiDB cluster. ## Upgrade caveat @@ -272,6 +273,30 @@ Re-execute the `tiup cluster upgrade` command to resume the upgrade. The upgrade tiup cluster replay ``` +### How to fix the issue that the upgrade gets stuck when upgrading to v6.2.0 or later versions? + +Starting from v6.2.0, TiDB enables the [concurrent DDL framework](/ddl-introduction.md#how-the-online-ddl-asynchronous-change-works-in-tidb) by default to execute concurrent DDLs. This framework changes the DDL job storage from a KV queue to a table queue. This change might cause the upgrade to get stuck in some scenarios. The following are some scenarios that might trigger this issue and the corresponding solutions: + +- Stuck caused by plugin loading + + During the upgrade, loading certain plugins that require executing DDL statements might cause the upgrade to get stuck. + + **Solution**: avoid loading plugins during the upgrade. Instead, load plugins only after the upgrade is completed. + +- Stuck caused by using the `kill -9` command for offline upgrade + + - Precautions: avoid using the `kill -9` command to perform the offline upgrade. If it is necessary, restart the new version TiDB node after 2 minutes. + - If the upgrade is already stuck, restart the affected TiDB node. If the issue has just occurred, it is recommended to restart the node after 2 minutes. + +- Stuck caused by DDL Owner change + + In multi-instance scenarios, network or hardware failures might cause DDL Owner change. If there are unfinished DDL statements in the upgrade phase, the upgrade might get stuck. + + **Solution**: + + 1. Terminate the stuck TiDB node (avoid using `kill -9`). + 2. Restart the new version TiDB node. + ### The evict leader has waited too long during the upgrade. How to skip this step for a quick upgrade? You can specify `--force`. Then the processes of transferring PD leader and evicting TiKV leader are skipped during the upgrade. The cluster is directly restarted to update the version, which has a great impact on the cluster that runs online. In the following command, `` is the version to upgrade to, such as `v7.4.0`. From 79ef63d2641cd19be07471fdb0362f94af140a7c Mon Sep 17 00:00:00 2001 From: Aolin Date: Thu, 16 Nov 2023 15:08:03 +0800 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Ran --- upgrade-tidb-using-tiup.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index 232e3dcb6b55f..cbb05593f3c80 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -24,7 +24,7 @@ This document is targeted for the following upgrade paths: > **Note:** > > - If your cluster to be upgraded is v3.1 or an earlier version (v3.0 or v2.1), the direct upgrade to v7.4.0 is not supported. You need to upgrade your cluster first to v4.0 and then to v7.4.0. -> - If your cluster to be upgraded is earlier than v6.2, you might encounter the issue that the upgrade gets stuck when upgrading to v6.2 or later versions. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). +> - If your cluster to be upgraded is earlier than v6.2, the upgrade might get stuck when you upgrade the cluster to v6.2 or later versions. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). > - TiDB nodes use the value of the [`server-version`](/tidb-configuration-file.md#server-version) configuration item to verify the current TiDB version. Therefore, to avoid unexpected behaviors, before upgrading the TiDB cluster, you need to set the value of `server-version` to empty or the real version of the current TiDB cluster. ## Upgrade caveat @@ -277,18 +277,18 @@ Re-execute the `tiup cluster upgrade` command to resume the upgrade. The upgrade Starting from v6.2.0, TiDB enables the [concurrent DDL framework](/ddl-introduction.md#how-the-online-ddl-asynchronous-change-works-in-tidb) by default to execute concurrent DDLs. This framework changes the DDL job storage from a KV queue to a table queue. This change might cause the upgrade to get stuck in some scenarios. The following are some scenarios that might trigger this issue and the corresponding solutions: -- Stuck caused by plugin loading +- Upgrade gets stuck due to plugin loading During the upgrade, loading certain plugins that require executing DDL statements might cause the upgrade to get stuck. **Solution**: avoid loading plugins during the upgrade. Instead, load plugins only after the upgrade is completed. -- Stuck caused by using the `kill -9` command for offline upgrade +- Upgrade gets stuck due to using the `kill -9` command for offline upgrade - Precautions: avoid using the `kill -9` command to perform the offline upgrade. If it is necessary, restart the new version TiDB node after 2 minutes. - If the upgrade is already stuck, restart the affected TiDB node. If the issue has just occurred, it is recommended to restart the node after 2 minutes. -- Stuck caused by DDL Owner change +- Upgrade gets stuck due to DDL Owner change In multi-instance scenarios, network or hardware failures might cause DDL Owner change. If there are unfinished DDL statements in the upgrade phase, the upgrade might get stuck. From 98bae3208e53c60bb089d2260fd0adb54f2d1f5d Mon Sep 17 00:00:00 2001 From: Aolin Date: Mon, 20 Nov 2023 17:58:29 +0800 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Frank945946 <108602632+Frank945946@users.noreply.github.com> --- upgrade-tidb-using-tiup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index cbb05593f3c80..7f2e47e5bad82 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -24,7 +24,7 @@ This document is targeted for the following upgrade paths: > **Note:** > > - If your cluster to be upgraded is v3.1 or an earlier version (v3.0 or v2.1), the direct upgrade to v7.4.0 is not supported. You need to upgrade your cluster first to v4.0 and then to v7.4.0. -> - If your cluster to be upgraded is earlier than v6.2, the upgrade might get stuck when you upgrade the cluster to v6.2 or later versions. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). +> - If your cluster to be upgraded is earlier than v6.2, the upgrade might get stuck when you upgrade the cluster to v6.2 or later versions in some scenarios. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). > - TiDB nodes use the value of the [`server-version`](/tidb-configuration-file.md#server-version) configuration item to verify the current TiDB version. Therefore, to avoid unexpected behaviors, before upgrading the TiDB cluster, you need to set the value of `server-version` to empty or the real version of the current TiDB cluster. ## Upgrade caveat