Support restart of service pods (#103) (#2377)

* support restart of service pods * add example to show how to use secret * Update 3.1.customize-installation.md * Update mkdocs.yml * minor changes * Update docs-2.0-zh/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-zh/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * Update docs-2.0-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md * comment fix * Update mkdocs.yml * comment fix --------- Co-authored-by: Chris Chen <chris.chen@vesoft.com>
vesoft-inc · Nov 28, 2023 · b55880b · b55880b
1 parent 0c9dfd7
commit b55880b
Show file tree

Hide file tree

Showing 2 changed files with 390 additions and 0 deletions.
diff --git a/...-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md b/...-en/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md
@@ -0,0 +1,195 @@
+# Restart service Pods in a NebulaGraph cluster on K8s
+
+!!! note
+
+    Restarting NebulaGraph cluster service Pods is a feature in the Alpha version.
+
+During routine maintenance, it might be necessary to restart a specific service Pod in the NebulaGraph cluster, for instance, when the Pod's status is abnormal or to enforce a restart. Restarting a Pod essentially means restarting the service process. To ensure high availability, NebulaGraph Operator supports gracefully restarting all Pods of the Graph, Meta, or Storage service respectively and gracefully restarting an individual Pod of the Storage service.
+
+## Prerequisites
+
+A NebulaGraph cluster is created in a K8s environment. For details, see [Create a NebulaGraph cluster](../4.1.installation/4.1.1.cluster-install.md).
+
+## Restart all Pods of a certain service type
+
+To gracefully roll restart all Pods of a certain service type in the cluster, you can add an annotation (`nebula-graph.io/restart-timestamp`) with the current time to the configuration of the StatefulSet controller of the corresponding service.
+
+When NebulaGraph Operator detects that the StatefulSet controller of the corresponding service has the annotation `nebula-graph.io/restart-timestamp` and its value is changed, it triggers the graceful rolling restart operation for all Pods of that service type in the cluster.
+
+In the following example, the annotation is added for all Graph services so that all Pods of these Graph services are restarted one by one.
+
+Assume that the cluster name is `nebula` and the cluster resources are in the `default` namespace. Run the following command:
+
+1. Check the name of the StatefulSet controller.
+
+  ```bash
+  kubectl get statefulset 
+  ```
+
+  Sample output:
+
+  ```bash
+  NAME              READY   AGE
+  nebula-graphd     2/2     33s
+  nebula-metad      3/3     69s
+  nebula-storaged   3/3     69s
+  ```
+
+2. Get the current timestamp.
+
+  ```bash
+  date -u +%s
+  ```
+
+  Example output:
+
+  ```bash
+  1700547115
+  ```
+
+3. Overwrite the timestamp annotation of the StatefulSet controller to trigger the graceful rolling restart operation.
+
+  ```bash
+  kubectl annotate statefulset nebula-graphd nebula-graph.io/restart-timestamp="1700547115" --overwrite
+  ```
+
+  Example output:
+
+  ```bash
+  statefulset.apps/nebula-graphd annotate
+  ```
+
+4. Observe the restart process.
+
+  ```bash
+  kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=graphd -w
+  ```
+
+  Example output:
+
+  ```bash
+  NAME              READY   STATUS    RESTARTS   AGE
+  nebula-graphd-0   1/1     Running   0          9m37s
+  nebula-graphd-1   0/1     Running   0          17s
+  nebula-graphd-1   1/1     Running   0          20s
+  nebula-graphd-0   1/1     Terminating   0          9m40s
+  nebula-graphd-0   0/1     Terminating   0          9m41s
+  nebula-graphd-0   0/1     Terminating   0          9m42s
+  nebula-graphd-0   0/1     Terminating   0          9m42s
+  nebula-graphd-0   0/1     Terminating   0          9m42s
+  nebula-graphd-0   0/1     Pending       0          0s
+  nebula-graphd-0   0/1     Pending       0          0s
+  nebula-graphd-0   0/1     ContainerCreating   0          0s
+  nebula-graphd-0   0/1     Running             0          2s
+  ```
+
+  This above output shows the status of Graph service Pods during the restart process.
+
+5. Verify that the StatefulSet controller annotation is updated.
+
+  ```bash
+  kubectl get statefulset nebula-graphd -o yaml | grep "nebula-graph.io/restart-timestamp"
+
+  ```
+
+  Example output:
+
+  ```yaml
+  nebula-graph.io/last-applied-configuration: '{"persistentVolumeClaimRetentionPolicy":{"whenDeleted":"Retain","whenScaled":"Retain"},"podManagementPolicy":"Parallel","replicas":2,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-headless","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"7c55c0e5ac74e85f","nebula-graph.io/restart-timestamp":"1700547815"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/sh","-ecx","exec
+  nebula-graph.io/restart-timestamp: "1700547115"
+      nebula-graph.io/restart-timestamp: "1700547815" 
+  ```
+
+The above output indicates that the annotation of the StatefulSet controller has been updated, and all graph service Pods has been restarted.
+
+
+## Restart a single Storage service Pod
+
+To gracefully roll restart a single Storage service Pod, you can add an annotation (`nebula-graph.io/restart-ordinal`) with the value set to the ordinal number of the Storage service Pod you want to restart. This triggers a graceful restart or state transition for that specific Storage service Pod. The added annotation will be automatically removed after the Storage service Pod is restarted.
+
+In the following example, the annotation is added for the Pod with ordinal number `1`, indicating a graceful restart for the `nebula-storaged-1` Storage service Pod.
+
+Assume that the cluster name is `nebula`, and the cluster resources are in the `default` namespace. Run the following commands:
+
+1. Check the name of the StatefulSet controller.
+
+  ```bash
+  kubectl get statefulset 
+  ```
+
+  Example output:
+
+  ```bash
+  NAME              READY   AGE
+  nebula-graphd     2/2     33s
+  nebula-metad      3/3     69s
+  nebula-storaged   3/3     69s
+  ```
+
+2. Get the ordinal number of the Storage service Pod.
+
+  ```bash
+  kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged
+  ```
+
+  Example output:
+
+  ```bash
+  NAME                READY   STATUS    RESTARTS   AGE
+  nebula-storaged-0   1/1     Running   0          13h
+  nebula-storaged-1   1/1     Running   0          13h
+  nebula-storaged-2   1/1     Running   0          13h
+  nebula-storaged-3   1/1     Running   0          13h
+  nebula-storaged-4   1/1     Running   0          13h
+  nebula-storaged-5   1/1     Running   0          13h
+  nebula-storaged-6   1/1     Running   0          13h
+  nebula-storaged-7   1/1     Running   0          13h
+  nebula-storaged-8   1/1     Running   0          13h
+  ```
+
+3. Add the annotation for the `nebula-storaged-1` pod to trigger a graceful restart for that specific pod.
+
+  ```bash
+  kubectl annotate statefulset nebula-storaged nebula-graph.io/restart-ordinal="1" 
+  ```
+
+  Example output:
+
+  ```bash
+  statefulset.apps/nebula-storaged annotate
+  ```
+
+4. Observe the restart process.
+
+  ```bash
+  kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged -w
+  ```
+
+  Example output:
+
+  ```bash
+  NAME                READY   STATUS    RESTARTS   AGE
+  nebula-storaged-0   1/1     Running   0          13h
+  nebula-storaged-1   1/1     Running   0          13h
+  nebula-storaged-2   1/1     Running   0          13h
+  nebula-storaged-3   1/1     Running   0          13h
+  nebula-storaged-4   1/1     Running   0          13h
+  nebula-storaged-5   1/1     Running   0          12h
+  nebula-storaged-6   1/1     Running   0          12h
+  nebula-storaged-7   1/1     Running   0          12h
+  nebula-storaged-8   1/1     Running   0          12h
+  
+  
+  nebula-storaged-1   1/1     Running   0          13h
+  nebula-storaged-1   1/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Pending       0          0s
+  nebula-storaged-1   0/1     Pending       0          0s
+  nebula-storaged-1   0/1     ContainerCreating   0          0s
+  nebula-storaged-1   0/1     Running             0          1s
+  nebula-storaged-1   1/1     Running             0          10s  
+
+The above output indicates that the `nebula-storaged-1` Storage service Pod has been successfully restarted.
diff --git a/...-zh/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md b/...-zh/k8s-operator/4.cluster-administration/4.9.advanced/4.9.2.restart-cluster.md
@@ -0,0 +1,195 @@
+# 重启 K8s 上的{{nebula.name}}集群服务
+
+!!! note
+
+    重启集群服务 Pod 功能为 Alpha 版本功能。
+
+在日常维护时，出于各种原因需要重启{{nebula.name}}集群的某个服务 Pod，例如 Pod 状态异常或是执行强行重启逻辑。Pod 重启的本质是重启服务进程，为了确保服务的高可用性，NebulaGraph Operator 支持优雅滚动重启集群内所有 Graph，Meta，或 Storage 服务 Pod，也支持优雅重启单个 Storage 服务 Pod。
+
+## 前提条件
+
+已经在 K8s 环境中创建了一个{{nebula.name}}集群。具体步骤，参见[创建{{nebula.name}}集群](../4.1.installation/4.1.1.cluster-install.md)。
+
+## 优雅滚动重启集群内某类服务的所有 Pod
+
+通过在不同服务 StatefulSet 控制器的配置中添加注解（annotation）`nebula-graph.io/restart-timestamp`并将值设置为当前时间来实现优雅滚动重启集群同类服务 Pod。当 NebulaGraph Operator 检测到相应服务的 StatefulSet 控制器存在注解`nebula-graph.io/restart-timestamp`并且其值发生了变更，即会触发优雅滚动重启集群内某类服务的所有 Pod 的操作。
+
+以下示例中，为所有 Graph 服务都设置注解，表示将逐个重启所有 Graph 服务 Pod。
+
+假设所有集群名为`nebula`，集群资源都放在`default`命名空间下，执行以下命令：
+
+
+1. 查看 StatefulSet 控制器的名称。
+
+  ```bash
+  kubectl get statefulset 
+  ```
+
+  示例输出：
+
+  ```bash
+  NAME              READY   AGE
+  nebula-graphd     2/2     33s
+  nebula-metad      3/3     69s
+  nebula-storaged   3/3     69s
+  ```
+
+2. 获取当前时间戳。
+
+  ```bash
+  date -u +%s
+  ```
+  示例输出：
+
+  ```bash
+  1700547115
+  ```
+
+3. 覆盖 StatefulSet 控制器时间戳注解以触发优雅滚动重启操作。
+
+  ```bash
+  kubectl annotate statefulset nebula-graphd nebula-graph.io/restart-timestamp="1700547115" --overwrite
+  ```
+
+  示例输出：
+
+  ```bash
+  statefulset.apps/nebula-graphd annotate
+  ```
+
+4. 观察重启过程。
+
+  ```bash
+  kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=graphd -w
+  ```
+
+  示例输出：
+
+  ```bash
+  NAME              READY   STATUS    RESTARTS   AGE
+  nebula-graphd-0   1/1     Running   0          9m37s
+  nebula-graphd-1   0/1     Running   0          17s
+  nebula-graphd-1   1/1     Running   0          20s
+  nebula-graphd-0   1/1     Terminating   0          9m40s
+  nebula-graphd-0   0/1     Terminating   0          9m41s
+  nebula-graphd-0   0/1     Terminating   0          9m42s
+  nebula-graphd-0   0/1     Terminating   0          9m42s
+  nebula-graphd-0   0/1     Terminating   0          9m42s
+  nebula-graphd-0   0/1     Pending       0          0s
+  nebula-graphd-0   0/1     Pending       0          0s
+  nebula-graphd-0   0/1     ContainerCreating   0          0s
+  nebula-graphd-0   0/1     Running             0          2s
+  ```
+
+  上述输出显示所有 Graph 服务 Pod 重启过程。
+
+5. 确认 StatefulSet 控制器注解更新。
+
+  ```bash
+  kubectl get statefulset nebula-graphd -o yaml | grep "nebula-graph.io/restart-timestamp"
+  ```
+
+  示例输出：
+
+  ```yaml
+  nebula-graph.io/last-applied-configuration: '{"persistentVolumeClaimRetentionPolicy":{"whenDeleted":"Retain","whenScaled":"Retain"},"podManagementPolicy":"Parallel","replicas":2,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-headless","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"7c55c0e5ac74e85f","nebula-graph.io/restart-timestamp":"1700547815"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/sh","-ecx","exec
+  nebula-graph.io/restart-timestamp: "1700547115"
+      nebula-graph.io/restart-timestamp: "1700547815" 
+  ```
+
+由上述输出可知，StatefulSet 控制器的注解已经更新并且 Graph 服务的所有 Pod 已经重启。
+
+
+## 优雅滚动重启单个 Storage 服务 Pod
+
+通过在 Storage 服务的 StatefulSet 控制器的配置中添加注解（annotation）`nebula-graph.io/restart-ordinal`并将值设置为 Storage 服务 Pod 的序号来实现优雅滚动重启单个 Storage 服务 Pod，即执行状态转移操作。在 Storage 服务 Pod 重启后，添加的注解会被删除。
+
+以下示例为序号为`1`的 Storage Pod 添加注解，表示将优雅重启名为`nebula-storaged-1`的 Storage 服务 Pod。
+
+假设所有集群名为`nebula`，集群资源都放在`default`命名空间下，执行以下命令：
+
+1. 查看 StatefulSet 控制器名称。
+
+  ```bash
+  kubectl get statefulset 
+  ```
+
+  示例输出：
+
+  ```bash
+  NAME              READY   AGE
+  nebula-graphd     2/2     33s
+  nebula-metad      3/3     69s
+  nebula-storaged   3/3     69s
+  ```
+
+2. 获取 Storage 服务 Pod 的序号。
+
+  ```bash
+  kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged
+  ```
+
+  示例输出：
+
+  ```bash
+  NAME                READY   STATUS    RESTARTS   AGE
+  nebula-storaged-0   1/1     Running   0          13h
+  nebula-storaged-1   1/1     Running   0          13h
+  nebula-storaged-2   1/1     Running   0          13h
+  nebula-storaged-3   1/1     Running   0          13h
+  nebula-storaged-4   1/1     Running   0          13h
+  nebula-storaged-5   1/1     Running   0          13h
+  nebula-storaged-6   1/1     Running   0          13h
+  nebula-storaged-7   1/1     Running   0          13h
+  nebula-storaged-8   1/1     Running   0          13h
+  ```
+
+3. 为`nebula-storaged-1` Pod 添加注解以触发优雅滚动重启该 Pod 操作。
+
+  ```bash
+  kubectl annotate statefulset nebula-storaged nebula-graph.io/restart-ordinal="1" 
+  ```
+
+  示例输出：
+
+  ```bash
+  statefulset.apps/nebula-storaged annotate
+  ```
+
+4. 观察重启过程。
+
+  ```bash
+  kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged -w
+  ```
+
+  示例输出：
+
+  ```bash
+  NAME                READY   STATUS    RESTARTS   AGE
+  nebula-storaged-0   1/1     Running   0          13h
+  nebula-storaged-1   1/1     Running   0          13h
+  nebula-storaged-2   1/1     Running   0          13h
+  nebula-storaged-3   1/1     Running   0          13h
+  nebula-storaged-4   1/1     Running   0          13h
+  nebula-storaged-5   1/1     Running   0          12h
+  nebula-storaged-6   1/1     Running   0          12h
+  nebula-storaged-7   1/1     Running   0          12h
+  nebula-storaged-8   1/1     Running   0          12h
+  
+  
+  nebula-storaged-1   1/1     Running   0          13h
+  nebula-storaged-1   1/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Terminating   0          13h
+  nebula-storaged-1   0/1     Pending       0          0s
+  nebula-storaged-1   0/1     Pending       0          0s
+  nebula-storaged-1   0/1     ContainerCreating   0          0s
+  nebula-storaged-1   0/1     Running             0          1s
+  nebula-storaged-1   1/1     Running             0          10s
+
+由上述输出可知，`nebula-storaged-1` Storage 服务 Pod 已经重启。
+
+
+