Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update kpanda FAQs #3878

Merged
merged 1 commit into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/en/docs/kpanda/images/faq01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/en/docs/kpanda/images/faq02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
172 changes: 132 additions & 40 deletions docs/en/docs/kpanda/intro/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This page lists some frequently asked questions that may arise in container mana

As shown in the figure, the container management module will automatically create and launch a Job responsible for installing the specific application. In version v0.6.0, due to unreasonable job resource settings, OOM was caused, affecting application installation. This bug has been fixed in version 0.6.1. If you upgrade to the environment of v0.6.1, it will only take effect in new created or accessed clusters. Existing clusters need to be manually adjusted to take effect.

??? note "Adjustment Script"
??? note "Click to check how to adjust script"

- The following scripts are executed in the global service cluster
- Find the corresponding cluster, take skoala-dev as an example in this article, and obtain the corresponding skoala-dev-setting configmap.
Expand All @@ -38,82 +38,111 @@ This page lists some frequently asked questions that may arise in container mana
name: skoala-dev
uid: f916e461-8b6d-47e4-906e-5e807bfe63d4
uid: 8a25dfa9-ef32-46b4-bc36-b37b775a9632
```

Modify `clusterSetting-> helm_operation_job_template_resources` to the appropriate value, and the value corresponding to version v0.6.1 is `cpu: 100m,memory: 400Mi`.
```
Modify `clusterSetting` -> `helm_operation_job_template_resources` to the appropriate value,
and the value corresponding to version v0.6.1 is `cpu: 100m,memory: 400Mi`.

1. Permission issues between container management module and global management module
1. Permission issues between the container management module and the global management module

Users often ask why they can see or not see a specific cluster. How do we troubleshoot related permission issues? There are three situations:
Users often ask why they can see a particular cluster or why they cannot see a specific cluster.
How can we troubleshoot related permission issues? The permissions in the container management module
are divided into cluster permissions and namespace permissions. If a user is bound, they can view the
corresponding cluster and resources. For specific permission details, refer to the
[Cluster Permission Explanation](../user-guide/permissions/permission-brief.md).

- The permissions of the container management module are divided into cluster permissions and namespace permissions. If a user is bound, the user can view the corresponding clusters and resources. For specific permission descriptions, refer to [Cluster Permission Description](../user-guide/permissions/permission-brief.md).
![Container Management Permissions](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq201.png)

In the global management module, user authorization includes using the admin account, going to __Global Management__ -> __Users and Access Control__ -> __Users__ menu, finding the corresponding user. In the __Authorize User Group__ tab, if there are roles such as Admin, Kpanda Owner, etc., that have container management permissions, even if the cluster permissions or namespace permissions are not bound in the container management, they can see all clusters. For more information, refer to [User Authorization Documentation](../../ghippo/user-guide/access-control/user.md)

- Authorization of users in the global management module: Use the admin account to enter __Global Management__ -> __User and Access Control__ -> __Users__ menu, find the corresponding user. In the __Authorized User Groups__ tab, if there are roles such as Admin, Kpanda Owner, etc. that have container management permissions, even if the cluster permission or namespace permission is not bound in the container management module, the user can see all clusters. Refer to [User Authorization Document](../../ghippo/user-guide/access-control/user.md).
![Global Management User Authorization](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq202.png)

- Workspace binding in the global management module: Use an account to enter __Global Management__ -> __Workspace and Folder__ , and you can see your authorized workspace. Click the workspace name.
In the global management module, workspace binding involves using the account to go to __Global Management__ -> __Workspaces and Hierarchies__, where you can see your authorized workspace. Click on the workspace name.

1. If the workspace is separately authorized to oneself, the user can see their account on the authorization tab. Then check the resource group or shared resource tab. If the resource group is bound with the namespace or the shared resource is bound with the cluster, the user can see the corresponding cluster.
1. If the workspace is authorized for you individually, you can see your account in the authorization tab, then check the resource group or shared resource tab. If the resource group is bound to a namespace or the shared resource is bound to a cluster, then your account can see the corresponding cluster.

1. If granted relevant roles related to global management, the user cannot view their own account on the authorization tab, and also cannot see the cluster resources bound to the workspace in the container management module.
2. If you have been granted a global management-related role, you will not see your account in the authorization tab, and you will not be able to see the cluster resources bound to the workspace in the container management module.

1. When installing Helm applications, kpanda-shell image cannot be pulled
![Global Management Workspace Binding](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq203.png)

After offline installation, it is common to encounter the failure of pulling the kpanda-shell image when installing Helm applications on the connected cluster, as shown in the figure:
2. When installing applications with Helm, unable to pull the kpanda-shell image

![pulling image failed](https://docs.daocloud.io/daocloud-docs-images/docs/en/docs/kpanda/images/faq301.png)
After offline installation, connected clusters often encounter failures in pulling the kpanda-shell image, as shown in the image:

At this time, just go to the cluster management-cluster settings page, advanced configuration tab, and modify the Helm operation base image to a kpanda-shell image that can be pulled normally by the cluster.
![Image Pull Failure](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq301.png)

1. The latest chart uploaded to the corresponding Helm Repo is not displayed in the Helm Chart interface, as shown in the figure:
In this case, simply go to the cluster management - cluster settings page, advanced configuration tab, and modify the Helm base image to a kpanda-shell image that can be successfully pulled by the cluster.

At this time, just go to the Helm repository and refresh the corresponding Helm repository.
![Modify Image](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq302.png)

1. When the installation of Helm applications fails and cannot be deleted and reinstalled, as shown in the figure:
1. Helm Chart interface does not display the latest Chart uploaded to the corresponding Helm Repo, as shown in the image:

At this time, just go to the Custom Resource Definition (CRD) page, find the helmreleases.helm.kpanda.io CRD, and then find and delete the corresponding helmreleases CR.
![Template](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq401.png)

1. After deleting node affinity and other scheduling policies of workloads, the scheduling is abnormal, as shown in the figure:
In this case, simply refresh the corresponding Helm repository.

At this time, it may be because the policy was not completely removed. Click Edit and remove all policies.
![Refresh Repository](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq402.png)

1. What is the logic of Kcoral detecting the status of Velero in the working cluster?
1. When an application installation with Helm fails and is stuck in installation, unable to delete the application for reinstallation, as shown in the image:

- The working cluster installs the standard velero component under the velero namespace.
![Deletion Failure](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq501.png)

- The velero deployment in the Velero control plane is running and reaches the expected number of replicas.
In this case, simply go to the custom resources page, find the helmreleases.helm.kpanda.io CRD, and then delete the corresponding helmreleases CR.

- The node agent of velero data plane is running and reaches the expected number of replicas.
![Find CR](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq502.png)

![Delete CR](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq503.png)

1. After removing node affinity and other scheduling strategies in Workloads, abnormal scheduling occurs, as shown in the image:

![Scheduling Abnormality](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq601.png)

In this case, it may be due to the incomplete removal of the strategy. Click on edit and delete all strategies.

![Edit](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq602.png)

![Delete](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq603.png)

![Normal Scheduling](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq604.png)

1. What is the logic behind Kcoral checking the Velero status of a working cluster?

![Detection](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq701.png)

- The working cluster has installed standard Velero components in the velero namespace.
- The velero control plane, specifically the velero deployment, is in a running state and meets the expected replica count.
- The velero data plane, specifically the node agent, is in a running state and meets the expected replica count.
- Velero successfully connects to the target MinIO (BSL status is Available).

1. When backing up and restoring across clusters, how does Kcoral obtain available clusters?
1. When performing cross-cluster backup and restore with Kcoral, how does it determine the available clusters?

When using Kcoral to backup and restore applications across clusters, on the recovery page, Kcoral will help users filter the list of clusters that can perform cross-cluster recovery, with the following logic:
When performing cross-cluster backup and restore applications with Kcoral, in the restore page, Kcoral will help filter the list of clusters capable of performing cross-cluster restore based on the following logic:

- Filter the list of clusters that do not have Velero installed.
![Filter](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq801.png)

- Filter the list of clusters where Velero is in an abnormal state.
- Filtering out clusters that have not installed Velero.
- Filtering out clusters with abnormal Velero status.
- Obtaining a list of clusters that are connected to the same MinIO and Bucket as the target cluster and returning them.

- Get the list of clusters that are docked with the same MinIO and Bucket as the target cluster and return them.
Therefore, as long as the clusters are connected to the same MinIO and Bucket, and Velero is in a running state, cross-cluster backup (requires write permission) and restore can be performed.

So as long as the same MinIO and Bucket are docked and Velero is in a running state, cross-cluster backup (requires write permission) and restoration can be performed.
1. After uninstalling VPA, HPA, CronHPA, why do the corresponding elastic scaling records still exist?

1. After uninstalling VPA, HPA, CronHPA, why are the corresponding elastic scaling records still there?
Even though the components were uninstalled through the Helm Addon market, the related records in the application elastic scaling interface remain, as shown in the image:

Although the corresponding components were uninstalled through Helm Addon Market, the relevant records in the application elastic scaling interface still exist, as shown in the figure:
![Edit](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq901.png)

This is a problem with helm uninstall, which does not uninstall the corresponding CRD, resulting in residual data. At this time, we need to manually uninstall the corresponding CRD to complete the final cleanup work.
This is a problem with helm uninstall, as it does not uninstall the corresponding CRD, causing residual data. In this case, we need to manually uninstall the corresponding CRD to complete the final cleanup.

1. Why is the console of a lower version cluster opened abnormally?
1. Why does the console fail to open on clusters with lower versions?

In Kubernetes clusters with versions below v1.18, opening the console will result in csr resource request failure. When opening the console, certificates are requested through csr resources in the target cluster according to the currently logged-in user. If the cluster version is too low or this feature is not enabled in the controller, certificate application will fail and it will be impossible to connect to the target cluster.
In Kubernetes clusters with lower versions (below v1.18), opening the console results in a CSR resource request failure. When opening the console, the current logged-in user in the target cluster requests a certificate through the CSR resource. If the cluster version is too low or this functionality is not enabled in the controller, the certificate request fails, preventing connection to the target cluster.

Refer to the certificate signing request process: https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/
Refer to the [certificate request process](https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/).

Solution:

- If the cluster version is higher than v1.18, check whether the kube-controller-manager has enabled the csr function, and ensure that the following controllers are enabled:
- If the cluster version is greater than v1.18, check if kube-controller-manager has enabled the csr feature, ensuring the following controllers are normal:

```shell
ttl-after-finished,bootstrapsigner,csrapproving,csrcleaner,csrsigning
Expand All @@ -123,8 +152,71 @@ This page lists some frequently asked questions that may arise in container mana

1. How to reset a created cluster?

Created clusters can be divided into two cases:
Created clusters fall into two categories:

- Clusters that failed to create: Clusters that failed to create due to parameter errors during the creation process can be retried by selecting retry on the failed installation and then reconfiguring the parameters for a new creation.
- Successfully created clusters: These clusters can be uninstalled first, and then recreated. Uninstalling a cluster requires disabling cluster protection to uninstall the cluster.

![Disable Cluster Protection](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq1101.png)

![Uninstall Cluster](https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/images/faq1102.png)

1. Failure to install plugins when connecting to a cluster

In offline environments connected clusters, before installing plugins, you need to configure the CRI proxy repository to ignore TLS verification (needs to be executed on all nodes).

=== "Docker"

1. Modify the file `/etc/docker/daemon.json`

2. Add "insecure-registries": ["172.30.120.243","temp-registry.daocloud.io"], to the content after modification:

![Modify Configuration](../images/faq01.png)

3. Restart docker

```shell
systemctl restart docker
systemctl daemon-reload
```

=== "containerd"

1. Modify `/etc/containerd/config.toml`

2. After modification, the content should be as follows:

```shell
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."temp-registry.daocloud.io"]
endpoint = ["http://temp-registry.daocloud.io"]
[plugins."io.containerd.grpc.v1.cri".registry.configs."http://temp-registry.daocloud.io".tls]
insecure_skip_verify = true
```

![Modify Configuration](../images/faq02.png)

3. Pay attention to spaces and line breaks, ensure the configuration is correct, and after modification, execute:

```shell
systemctl restart containerd
```

1. When creating a cluster, enabling **Kernel Tuning for New Clusters** in advanced settings causes cluster creation failure.

1. Check if the conntrack kernel module is loaded by running the following command:

```shell
lsmod |grep conntrack
```

2. If it returns empty, it means it is not loaded. Reload it by running the following command:

```shell
modprobe ip_conntrack
```

- Failed to create cluster: During the creation of the cluster, due to incorrect parameter settings, the cluster creation failed. In this case, you can choose to retry the installation of the failed cluster, and then reset the parameters and recreate it.
!!! note

- Successfully created cluster: For this type of cluster, you can first uninstall the cluster, and then recreate it. Turning off the cluster protection feature is necessary to uninstall the cluster.
Upgrading the kernel module can also cause cluster creation failures.
Loading
Loading