From 92a3de9d8c86c844725b037954e5fa80a6f1aee1 Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Wed, 5 Jun 2024 20:05:40 +0300 Subject: [PATCH 1/3] CLOUD-845 Added a doc how to troubleshoot Operator installation --- docs/environment-troubleshoot.md | 159 +++++++++++++++++++++++++++++++ mkdocs-base.yml | 1 + 2 files changed, 160 insertions(+) create mode 100644 docs/environment-troubleshoot.md diff --git a/docs/environment-troubleshoot.md b/docs/environment-troubleshoot.md new file mode 100644 index 00000000..a61a97dd --- /dev/null +++ b/docs/environment-troubleshoot.md @@ -0,0 +1,159 @@ +# Kubernetes environment troubleshooting + +This section provides information on how to troubleshoot Kubernetes environment if you are facing issues with installing Percona Operator for PostgreSQL. + +We will use `kubectl` – the command-line tool for interacting with the Kubernetes API. + +Let’s start with these basic troubleshooting steps. + +## Check connection to Kubernetes cluster + +It may happen that `kubectl` you installed locally is not connected to your Kubernetes cluster. + +1. To verify it, run the following command: + + ```{.bash data-prompt="$"} + $ kubectl cluster-info + ``` + + If you see the output similar to the following, it means that `kubectl` is connected to your Kubernetes cluster: + + ??? example "Sample output" + + ```{.text .no-copy} + Kubernetes control plane is running at https://127.0.0.1:49475 + CoreDNS is running at https://127.0.0.1:49475/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + ``` + +2. If you see the errors, do the following: + + === "Google Kubernetes Engine (GKE)" + + 1. Check if you have Google Cloud SDK installed and that you can interact with it by running `gcloud` in your terminal. If not, refer to the Google Cloud SDK Documentation for [installation instructions :octicons-link-external-16:](https://cloud.google.com/sdk/docs/install). + 2. In the Google Cloud Console, select your cluster and then click **Connect**. You will see the connect statement which configures the command-line access. After you have edited the statement, run the following command in your local shell replacing the `` with your project name: + + ```{.bash data-prompt="$"} + $ gcloud container clusters get-credentials cluster-1 --zone us-central1-a --project + ``` + + 3. Use your Cloud Identity and Access Management (Cloud IAM) to control the access to the cluster. The following command gives you the ability to create Roles and RoleBindings: + + ```{.bash data-prompt="$"} + $ kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value core/account) + ``` + + !!! note "" + + You may have the error like the following: + + ```{.text .no-copy} + error: failed to create clusterrolebinding: Post “https://34.66.76.82/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?fieldManager=kubectl-create&fieldValidation=Strict”: getting credentials: exec: executable gke-gcloud-auth-plugin not found + ``` + + In this case, follow these steps: + + 1. Install the `gke-gcloud-auth-plugin` using the following command: + + ```{.bash data-prompt="$"} + $ gcloud components install gke-gcloud-auth-plugin + ``` + 2. Authenticate to Google Cloud SDK using this command: + + ```{.bash data-prompt="$"} + $ gcloud auth login + ``` + + 3. Run the `kubectl create clusterrolebinding` command again. + + + === "Amazon Elastic Kubernetes Engine (AWS EKS)" + + 1. Check that you have installed the [Amazon EBS CSI driver :octicons-link-external-16:](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html). Otherwise install it using one of the suggested methods: [as an add-on :octicons-link-external-16:](https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html) or [as a self-managed installation :octicons-link-external-16:](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) + 2. Enable `kubectl` to communicate with your cluster by adding a new context to the `kubectl` config file. + + ```{.bash data-prompt="$"} + $ aws eks --region region update-kubeconfig --name cluster_name + ``` + + === "Minikube" + + 1. When deploying Kubernetes locally using Minikube, you may need to allocate additional resources. The recommended minimum resources are 4 CPUs, 30 Gb disk size and 5 GB of RAM. To start Minikube with these resources, run the following command: + + ```{.bash data-prompt="$"} + $ minikube start --memory=5120 --cpus=4 --disk-size=30g + ``` + + 2. To check if your `kubectl` is connected to Minikube, use the following command to check the current context. The current context determines which cluster `kubectl` is interacting with. You can get the current context by running the command: + + ```{.bash data-prompt="$"} + $ kubectl config current-context + ``` + + If the output is `minikube`, then `kubectl` is configured to interact with your Minikube cluster. + + If the context is other, you need to switch to Minikube context by running the following command: + + ```{.bash data-prompt="$"} + $ kubectl config use-context minikube + ``` + + +3. Run the `kubectl cluster-info` command again to verify that `kubectl` is connected to your Kubernetes cluster. + + +## Check Kubernetes Nodes + +Check that your Kubernetes cluster nodes are registered correctly. Run the following command: + +```{.bash data-prompt="$"} +$ kubectl get nodes +``` + +All nodes must be listed and must have the `Ready` status that indicates that the node is healthy and can accept pods. + +If you see the nodes in the `NotReady` status, it means that there are issues with the nodes. To get detailed information about a node, run the following commands: + +```{.bash data-prompt="$"} +$ kubectl describe node +``` + +Or, you can use the following command: + +```{.bash data-prompt="$"} +$ kubectl get node -o yaml +``` + +Both commands provide detailed information about the node, including the resources allocated to it, the pods running on the node, and the events related to the node. + +## Check user permissions + +It may happen that your user doesn’t have enough permissions for installing the Operator. To check it, use the following script: + +```{.bash data-prompt="$"} +$ bash <(curl -s https://gist.githubusercontent.com/cshiv/6048bdd0174275b48f633549c69d0844/raw/fd547b783a30b827362ee9f9ec03436f9bc79524/check_priviliges.sh) +``` + +??? example "Sample output" + + ```{.text .no-copy} + Checking privileges to install Percona Operators in kubernetes cluster... + Warning: Unable to check the privileges for resource 'issuers', check if the resource 'issuers' is present in the cluster + Warning: Unable to check the privileges for resource 'certificates', check if the resource 'certificates' is present in the cluster + + Warning: Some resources are not found in the kubernetes cluster.Check the Warning messages before you proceed + ------------------------------------------------------------------------------------------ + GOOD TO INSTALL: Percona Operator for PostgreSQL + https://docs.percona.com/percona-operator-for-postgresql/index.html + ------------------------------------------------------------------------------------------ + GOOD TO INSTALL: Percona Operator for MySQL based on Percona XtraDB Cluster + https://docs.percona.com/percona-operator-for-postgresql/index.html + ------------------------------------------------------------------------------------------ + GOOD TO INSTALL: Percona Operator for MongoDB + https://docs.percona.com/percona-operator-for-mongodb/index.html + ``` + +If you have insufficient permissions, the script will show you which ones are missing for installing a particular Operator. In this case, contact the Kubernetes cluster administrator. + + + + diff --git a/mkdocs-base.yml b/mkdocs-base.yml index 3e1e32bc..e039eb3f 100644 --- a/mkdocs-base.yml +++ b/mkdocs-base.yml @@ -220,6 +220,7 @@ nav: - "Monitor Kubernetes": monitor-kubernetes.md - "Use PostGIS extension": postgis.md - Troubleshooting: + - "Environment troubleshooting": environment-troubleshoot.md - "Initial troubleshooting": debug.md - "Exec into the container": debug-shell.md - "Check the logs": debug-logs.md From b53c2c88ae49008507049f89071cf2cfc9c276aa Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Tue, 11 Nov 2025 11:52:58 +0100 Subject: [PATCH 2/3] Updated the file --- docs/environment-troubleshoot.md | 159 ------------------------------ docs/troubleshoot-operator.md | 161 +++++++++++++++++++++++++++++++ mkdocs-base.yml | 2 +- 3 files changed, 162 insertions(+), 160 deletions(-) delete mode 100644 docs/environment-troubleshoot.md create mode 100644 docs/troubleshoot-operator.md diff --git a/docs/environment-troubleshoot.md b/docs/environment-troubleshoot.md deleted file mode 100644 index a61a97dd..00000000 --- a/docs/environment-troubleshoot.md +++ /dev/null @@ -1,159 +0,0 @@ -# Kubernetes environment troubleshooting - -This section provides information on how to troubleshoot Kubernetes environment if you are facing issues with installing Percona Operator for PostgreSQL. - -We will use `kubectl` – the command-line tool for interacting with the Kubernetes API. - -Let’s start with these basic troubleshooting steps. - -## Check connection to Kubernetes cluster - -It may happen that `kubectl` you installed locally is not connected to your Kubernetes cluster. - -1. To verify it, run the following command: - - ```{.bash data-prompt="$"} - $ kubectl cluster-info - ``` - - If you see the output similar to the following, it means that `kubectl` is connected to your Kubernetes cluster: - - ??? example "Sample output" - - ```{.text .no-copy} - Kubernetes control plane is running at https://127.0.0.1:49475 - CoreDNS is running at https://127.0.0.1:49475/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy - ``` - -2. If you see the errors, do the following: - - === "Google Kubernetes Engine (GKE)" - - 1. Check if you have Google Cloud SDK installed and that you can interact with it by running `gcloud` in your terminal. If not, refer to the Google Cloud SDK Documentation for [installation instructions :octicons-link-external-16:](https://cloud.google.com/sdk/docs/install). - 2. In the Google Cloud Console, select your cluster and then click **Connect**. You will see the connect statement which configures the command-line access. After you have edited the statement, run the following command in your local shell replacing the `` with your project name: - - ```{.bash data-prompt="$"} - $ gcloud container clusters get-credentials cluster-1 --zone us-central1-a --project - ``` - - 3. Use your Cloud Identity and Access Management (Cloud IAM) to control the access to the cluster. The following command gives you the ability to create Roles and RoleBindings: - - ```{.bash data-prompt="$"} - $ kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value core/account) - ``` - - !!! note "" - - You may have the error like the following: - - ```{.text .no-copy} - error: failed to create clusterrolebinding: Post “https://34.66.76.82/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?fieldManager=kubectl-create&fieldValidation=Strict”: getting credentials: exec: executable gke-gcloud-auth-plugin not found - ``` - - In this case, follow these steps: - - 1. Install the `gke-gcloud-auth-plugin` using the following command: - - ```{.bash data-prompt="$"} - $ gcloud components install gke-gcloud-auth-plugin - ``` - 2. Authenticate to Google Cloud SDK using this command: - - ```{.bash data-prompt="$"} - $ gcloud auth login - ``` - - 3. Run the `kubectl create clusterrolebinding` command again. - - - === "Amazon Elastic Kubernetes Engine (AWS EKS)" - - 1. Check that you have installed the [Amazon EBS CSI driver :octicons-link-external-16:](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html). Otherwise install it using one of the suggested methods: [as an add-on :octicons-link-external-16:](https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html) or [as a self-managed installation :octicons-link-external-16:](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) - 2. Enable `kubectl` to communicate with your cluster by adding a new context to the `kubectl` config file. - - ```{.bash data-prompt="$"} - $ aws eks --region region update-kubeconfig --name cluster_name - ``` - - === "Minikube" - - 1. When deploying Kubernetes locally using Minikube, you may need to allocate additional resources. The recommended minimum resources are 4 CPUs, 30 Gb disk size and 5 GB of RAM. To start Minikube with these resources, run the following command: - - ```{.bash data-prompt="$"} - $ minikube start --memory=5120 --cpus=4 --disk-size=30g - ``` - - 2. To check if your `kubectl` is connected to Minikube, use the following command to check the current context. The current context determines which cluster `kubectl` is interacting with. You can get the current context by running the command: - - ```{.bash data-prompt="$"} - $ kubectl config current-context - ``` - - If the output is `minikube`, then `kubectl` is configured to interact with your Minikube cluster. - - If the context is other, you need to switch to Minikube context by running the following command: - - ```{.bash data-prompt="$"} - $ kubectl config use-context minikube - ``` - - -3. Run the `kubectl cluster-info` command again to verify that `kubectl` is connected to your Kubernetes cluster. - - -## Check Kubernetes Nodes - -Check that your Kubernetes cluster nodes are registered correctly. Run the following command: - -```{.bash data-prompt="$"} -$ kubectl get nodes -``` - -All nodes must be listed and must have the `Ready` status that indicates that the node is healthy and can accept pods. - -If you see the nodes in the `NotReady` status, it means that there are issues with the nodes. To get detailed information about a node, run the following commands: - -```{.bash data-prompt="$"} -$ kubectl describe node -``` - -Or, you can use the following command: - -```{.bash data-prompt="$"} -$ kubectl get node -o yaml -``` - -Both commands provide detailed information about the node, including the resources allocated to it, the pods running on the node, and the events related to the node. - -## Check user permissions - -It may happen that your user doesn’t have enough permissions for installing the Operator. To check it, use the following script: - -```{.bash data-prompt="$"} -$ bash <(curl -s https://gist.githubusercontent.com/cshiv/6048bdd0174275b48f633549c69d0844/raw/fd547b783a30b827362ee9f9ec03436f9bc79524/check_priviliges.sh) -``` - -??? example "Sample output" - - ```{.text .no-copy} - Checking privileges to install Percona Operators in kubernetes cluster... - Warning: Unable to check the privileges for resource 'issuers', check if the resource 'issuers' is present in the cluster - Warning: Unable to check the privileges for resource 'certificates', check if the resource 'certificates' is present in the cluster - - Warning: Some resources are not found in the kubernetes cluster.Check the Warning messages before you proceed - ------------------------------------------------------------------------------------------ - GOOD TO INSTALL: Percona Operator for PostgreSQL - https://docs.percona.com/percona-operator-for-postgresql/index.html - ------------------------------------------------------------------------------------------ - GOOD TO INSTALL: Percona Operator for MySQL based on Percona XtraDB Cluster - https://docs.percona.com/percona-operator-for-postgresql/index.html - ------------------------------------------------------------------------------------------ - GOOD TO INSTALL: Percona Operator for MongoDB - https://docs.percona.com/percona-operator-for-mongodb/index.html - ``` - -If you have insufficient permissions, the script will show you which ones are missing for installing a particular Operator. In this case, contact the Kubernetes cluster administrator. - - - - diff --git a/docs/troubleshoot-operator.md b/docs/troubleshoot-operator.md new file mode 100644 index 00000000..70975221 --- /dev/null +++ b/docs/troubleshoot-operator.md @@ -0,0 +1,161 @@ +# Percona Operator troubleshooting + +This section provides information on how to troubleshoot issues when you install Percona Operator for PostgreSQL. + +Make sure you have CLI tool `kubectl` installed to interact with Kubernetes API. + + +## Check connection to Kubernetes cluster + +It may happen that `kubectl` you installed locally is not connected to your Kubernetes cluster. + +To check connectivity to your Kubernetes API, run the following command: + +```bash +kubectl cluster-info +``` + +If you see the output similar to the following, it means that `kubectl` is connected to your Kubernetes cluster: + +??? example "Sample output" + + ```{.text .no-copy} + Kubernetes control plane is running at https://:49475 + CoreDNS is running at https://:49475/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + ``` + +If multiple Kubernetes configurations are present in `kubeconfig`,check if you have set the correct context. If the context is wrong, switch it. Here's how: + +1. Check the current context: + + ```bash + kubectl config current-context # Get the current Context + ``` + +2. Switch the context : + + ```bash + kubectl config use-context + ``` + +3. Run the `kubectl cluster-info` command again to verify that `kubectl` is connected to your Kubernetes cluster. + +If you are still running into issues, check with your Kubernetes cluster administrator to resolve the connectivity or configuration issues. + + +## Troubleshoot Operator installation issues + +1. Installing the Operator requires specific privileges, such as the ability to create custom resource definitions and other Kubernetes objects. + + To verify that you have the necessary privileges, run the following script: + + ```bash + bash <(curl -s https://gist.githubusercontent.com/cshiv/6048bdd0174275b48f633549c69d0844/raw/fd547b783a30b827362ee9f9ec03436f9bc79524/check_priviliges.sh) + ``` + + ??? example "Sample output" + + ```{.text .no-copy} + Checking privileges to install Percona Operators in kubernetes cluster... + Warning: Unable to check the privileges for resource 'issuers', check if the resource 'issuers' is present in the cluster + Warning: Unable to check the privileges for resource 'certificates', check if the resource 'certificates' is present in the cluster + + Warning: Some resources are not found in the kubernetes cluster.Check the Warning messages before you proceed + ------------------------------------------------------------------------------------------ + GOOD TO INSTALL: Percona Operator for PostgreSQL + https://docs.percona.com/percona-operator-for-postgresql/index.html + ------------------------------------------------------------------------------------------ + GOOD TO INSTALL: Percona Operator for MySQL based on Percona XtraDB Cluster + https://docs.percona.com/percona-operator-for-postgresql/index.html + ------------------------------------------------------------------------------------------ + GOOD TO INSTALL: Percona Operator for MongoDB + https://docs.percona.com/percona-operator-for-mongodb/index.html + ``` + + If you have insufficient permissions, the script will show you which ones are missing for installing a particular Operator. In this case, contact the Kubernetes cluster administrator. + +2. If you have the necessary privileges but the installation is still failing, review the Kubernetes Events for more details. Keep in mind that Kubernetes Events are retained for only 60 minutes. + + ```bash + kubectl get events --sort-by=".lastTimestamp" + ``` + + Events provide good information about affinity issues, resource issues etc. + +3. Check the Operator logs + + ```bash + kubectl logs deploy/ + ``` + +## Troubleshooting database cluster issues + +1. The Operator deployment must be in the `Running` state for the database cluster to function properly. Check the Operator Pod for restarts to identify potential issues. + + ```bash + kubectl get pod + ``` + +2. Check the status of the database cluster + + ```bash + kubectl get pg + ``` + + The cluster should typically be in the `Running` state. It may briefly enter the `initializing` state while reconciling changes. If the cluster remains in the `initializing` state for an extended period, investigate further to identify any underlying issues. + +3. Check the Operator logs + + ```bash + kubectl logs deploy/ + ``` + +4. Check the events + + ```bash + kubectl get events --sort-by=".lastTimestamp" + ``` + + Events can provide information like storage class issues, PVC binding issues etc + + 5. Check for the PVC, PV. Both of them should be in `Bound` status + + ```bash + kubectl get pvc + ``` + + ```bash + kubectl get pv + ``` + +6. Check for logs of database pods / Proxy pods + + ```bash + kubectl logs + ``` + + ```bash + kubectl logs + ``` + + To check logs of `init` containers or other sidecar containers, use the option `-c` with the container name: + + ```bash + kubectl logs -c postgres-startup + ``` + +7. To run commands inside a container, use the `kubectl exec` command: + + ```bash + kubectl exec -- + ``` + + If you need an interactive shell to run multiple commands, use the `-it` flag for an interactive terminal: + + ```bash + kubectl exec -it -- sh + ``` + +8. If the pods are not running, it may not be possible to execute commands or open an interactive shell. In such cases, consider using a `sleep-forever` script to prevent the containers from restarting repeatedly. + + See the [Disable health check probes for maintenance](manage-manually.md#disable-health-check-probes-for-maintenance) section for steps. diff --git a/mkdocs-base.yml b/mkdocs-base.yml index 12c6a716..9cc359f9 100644 --- a/mkdocs-base.yml +++ b/mkdocs-base.yml @@ -242,7 +242,7 @@ nav: - "Retrieve Percona certified images": image-query.md - Troubleshooting: - - "Environment troubleshooting": environment-troubleshoot.md + - "Troubleshoot Operator installation issues": troubleshoot-operator.md - "Initial troubleshooting": debug.md - "Check storage": debug-storage.md - "Exec into the container": debug-shell.md From 8672699c6cc941b41bd3e5bc1fb44c3ef01c5aef Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Thu, 13 Nov 2025 15:17:09 +0100 Subject: [PATCH 3/3] Added describe commands to troubleshooting --- docs/troubleshoot-operator.md | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/docs/troubleshoot-operator.md b/docs/troubleshoot-operator.md index 70975221..5f5ce495 100644 --- a/docs/troubleshoot-operator.md +++ b/docs/troubleshoot-operator.md @@ -45,7 +45,13 @@ If you are still running into issues, check with your Kubernetes cluster adminis ## Troubleshoot Operator installation issues -1. Installing the Operator requires specific privileges, such as the ability to create custom resource definitions and other Kubernetes objects. +1. Check the Operator logs + + ```bash + kubectl logs deploy/ + ``` + +2. Installing the Operator requires specific privileges, such as the ability to create custom resource definitions and other Kubernetes objects. To verify that you have the necessary privileges, run the following script: @@ -74,7 +80,7 @@ If you are still running into issues, check with your Kubernetes cluster adminis If you have insufficient permissions, the script will show you which ones are missing for installing a particular Operator. In this case, contact the Kubernetes cluster administrator. -2. If you have the necessary privileges but the installation is still failing, review the Kubernetes Events for more details. Keep in mind that Kubernetes Events are retained for only 60 minutes. +3. If you have the necessary privileges but the installation is still failing, review the Kubernetes Events for more details. Keep in mind that Kubernetes Events are retained for only 60 minutes. ```bash kubectl get events --sort-by=".lastTimestamp" @@ -82,11 +88,6 @@ If you are still running into issues, check with your Kubernetes cluster adminis Events provide good information about affinity issues, resource issues etc. -3. Check the Operator logs - - ```bash - kubectl logs deploy/ - ``` ## Troubleshooting database cluster issues @@ -104,6 +105,12 @@ If you are still running into issues, check with your Kubernetes cluster adminis The cluster should typically be in the `Running` state. It may briefly enter the `initializing` state while reconciling changes. If the cluster remains in the `initializing` state for an extended period, investigate further to identify any underlying issues. + Additionally, you can describe the database cluster and search for the information in the `State` and `State Description` fields: + + ```bash + kubectl describe pg + ``` + 3. Check the Operator logs ```bash @@ -144,7 +151,19 @@ If you are still running into issues, check with your Kubernetes cluster adminis kubectl logs -c postgres-startup ``` -7. To run commands inside a container, use the `kubectl exec` command: +7. Check for error details. Run the `kubectl describe` command: + + ```bash + kubectl describe + ``` + + ```bash + kubectl describe + ``` + + Check the information in the `Status` section. The `State` and `State Description` fields explain why the Pod reports errors. + +8. To run commands inside a container, use the `kubectl exec` command: ```bash kubectl exec --