Skip to content

Latest commit

 

History

History
209 lines (181 loc) · 12.7 KB

README_zh.md

File metadata and controls

209 lines (181 loc) · 12.7 KB

KubeEye

All Contributors

kubeeye-logo

English | 中文

KubeEye 旨在发现 Kubernetes 上的各种问题,比如应用配置错误(使用 OPA )、集群组件不健康和节点问题(使用Node-Problem-Detector)。除了预定义的规则,它还支持自定义规则。

架构图

KubeEye 通过调用Kubernetes API,通过匹配资源中的关键字和容器语法的规则匹配来获取集群诊断数据,详见架构图。

kubeeye-architecture

怎么使用

  • 机器上安装 KubeEye

    • Releases 中下载预构建的可执行文件。

    • 或者你也可以从源代码构建

    提示:构建完成后将会在 /usr/local/bin/ 目录下生成 kubeeye 文件。

    git clone https://github.com/kubesphere/kubeeye.git
    cd kubeeye 
    make installke
    
  • [可选] 安装 Node-problem-Detector 注意:这将在你的集群上安装 npd,只有当你想要详细的节点报告时才需要。

kubeeye install npd
  • KubeEye 执行
kubeeye audit
KIND          NAMESPACE        NAME                                                           REASON                                        LEVEL    MESSAGE
Node                           docker-desktop                                                 kubelet has no sufficient memory available   waring    KubeletHasNoSufficientMemory
Node                           docker-desktop                                                 kubelet has no sufficient PID available      waring    KubeletHasNoSufficientPID
Node                           docker-desktop                                                 kubelet has disk pressure                    waring    KubeletHasDiskPressure
Deployment    default          testkubeeye                                                                                                                  NoCPULimits
Deployment    default          testkubeeye                                                                                                                  NoReadinessProbe
Deployment    default          testkubeeye                                                                                                                  NotRunAsNonRoot
Deployment    kube-system      coredns                                                                                                               NoCPULimits
Deployment    kube-system      coredns                                                                                                               ImagePullPolicyNotAlways
Deployment    kube-system      coredns                                                                                                               NotRunAsNonRoot
Deployment    kubeeye-system   kubeeye-controller-manager                                                                                            ImagePullPolicyNotAlways
Deployment    kubeeye-system   kubeeye-controller-manager                                                                                            NotRunAsNonRoot
DaemonSet     kube-system      kube-proxy                                                                                                            NoCPULimits
DaemonSet     k          ube-system      kube-proxy                                                                                                            NotRunAsNonRoot
Event         kube-system      coredns-558bd4d5db-c26j8.16d5fa3ddf56675f                      Unhealthy                                    warning   Readiness probe failed: Get "http://10.1.0.87:8181/ready": dial tcp 10.1.0.87:8181: connect: connection refused
Event         kube-system      coredns-558bd4d5db-c26j8.16d5fa3fbdc834c9                      Unhealthy                                    warning   Readiness probe failed: HTTP probe failed with statuscode: 503
Event         kube-system      vpnkit-controller.16d5ac2b2b4fa1eb                             BackOff                                      warning   Back-off restarting failed container
Event         kube-system      vpnkit-controller.16d5fa44d0502641                             BackOff                                      warning   Back-off restarting failed container
Event         kubeeye-system   kubeeye-controller-manager-7f79c4ccc8-f2njw.16d5fa3f5fc3229c   Failed                                       warning   Failed to pull image "controller:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for controller, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Event         kubeeye-system   kubeeye-controller-manager-7f79c4ccc8-f2njw.16d5fa3f61b28527   Failed                                       warning   Error: ImagePullBackOff
Role          kubeeye-system   kubeeye-leader-election-role                                                                                          CanDeleteResources
ClusterRole                    kubeeye-manager-role                                                                                                  CanDeleteResources
ClusterRole                    kubeeye-manager-role                                                                                                  CanModifyWorkloads
ClusterRole                    vpnkit-controller                                                                                                     CanImpersonateUser
ClusterRole                    vpnkit-controller                                                                                           CanDeleteResources

KubeEye 能做什么

  • KubeEye 根据行业最佳实践审查你的工作负载 yaml 规范,帮助你使你的集群稳定。
  • KubeEye 可以发现你的集群控制平面的问题,包括 kube-apiserver/kube-controller-manager/etcd 等。
  • KubeEye 可以帮助你检测各种节点问题,包括内存/CPU/磁盘压力,意外的内核错误日志等。

检查项

是/否 检查项 描述 级别
PrivilegeEscalationAllowed 允许特权升级 紧急
CanImpersonateUser role/clusterrole 有伪装成其他用户权限 警告
CanDeleteResources role/clusterrole 有删除 kubernetes 资源权限 警告
CanModifyWorkloads role/clusterrole 有修改 kubernetes 资源权限 警告
NoCPULimits 资源没有设置 CPU 使用限制 紧急
NoCPURequests 资源没有设置预留 CPU 紧急
HighRiskCapabilities 开启了高危功能,例如 ALL/SYS_ADMIN/NET_ADMIN 紧急
HostIPCAllowed 开启了主机 IPC 紧急
HostNetworkAllowed 开启了主机网络 紧急
HostPIDAllowed 开启了主机PID 紧急
HostPortAllowed 开启了主机端口 紧急
ImagePullPolicyNotAlways 镜像拉取策略不是 always 警告
ImageTagIsLatest 镜像标签是 latest 警告
ImageTagMiss 镜像没有标签 紧急
InsecureCapabilities 开启了不安全的功能,例如 KILL/SYS_CHROOT/CHOWN 警告
NoLivenessProbe 没有设置存活状态检查 警告
NoMemoryLimits 资源没有设置内存使用限制 紧急
NoMemoryRequests 资源没有设置预留内存 紧急
NoPriorityClassName 没有设置资源调度优先级 通知
PrivilegedAllowed 以特权模式运行资源 紧急
NoReadinessProbe 没有设置就绪状态检查 警告
NotReadOnlyRootFilesystem 没有设置根文件系统为只读 警告
NotRunAsNonRoot 没有设置禁止以 root 用户启动进程 警告
CertificateExpiredPeriod 将检查 ApiServer 证书的到期日期少于30天 紧急
EventAudit 事件检查 警告
NodeStatus 节点状态检查 警告
DockerStatus docker 状态检查 警告
KubeletStatus kubelet 状态检查 警告

添加自定义检查规则

添加自定义 OPA 检查规则

  • 创建 OPA 规则存放目录
mkdir opa
  • 添加自定义 OPA 规则文件

注意:为检查工作负载设置的 OPA 规则, package 名称必须是 kubeeye_workloads_rego 为检查 RBAC 设置的 OPA 规则, package 名称必须是 kubeeye_RBAC_rego 为检查节点设置的 OPA 规则, package 名称必须是 kubeeye_nodes_rego

  • 以下为检查镜像仓库地址规则,保存以下规则到规则文件 imageRegistryRule.rego
package kubeeye_workloads_rego

deny[msg] {
    resource := input
    type := resource.Object.kind
    resourcename := resource.Object.metadata.name
    resourcenamespace := resource.Object.metadata.namespace
    workloadsType := {"Deployment","ReplicaSet","DaemonSet","StatefulSet","Job"}
    workloadsType[type]

    not workloadsImageRegistryRule(resource)

    msg := {
        "Name": sprintf("%v", [resourcename]),
        "Namespace": sprintf("%v", [resourcenamespace]),
        "Type": sprintf("%v", [type]),
        "Message": "ImageRegistryNotmyregistry"
    }
}

workloadsImageRegistryRule(resource) {
    regex.match("^myregistry.public.kubesphere/basic/.+", resource.Object.spec.template.spec.containers[_].image)
}
  • 使用额外的规则运行 kubeeye

提示:kubeeye 将读取指定目录下所有 .rego 结尾的文件

kubeeye audit -p ./opa -f ~/.kube/config
NAMESPACE     NAME              KIND          MESSAGE
default       nginx1            Deployment    [ImageRegistryNotmyregistry NotReadOnlyRootFilesystem NotRunAsNonRoot]
default       nginx11           Deployment    [ImageRegistryNotmyregistry PrivilegeEscalationAllowed HighRiskCapabilities HostIPCAllowed HostPortAllowed ImagePullPolicyNotAlways ImageTagIsLatest InsecureCapabilities NoPriorityClassName PrivilegedAllowed NotReadOnlyRootFilesystem NotRunAsNonRoot]
default       nginx111          Deployment    [ImageRegistryNotmyregistry NoCPULimits NoCPURequests ImageTagMiss NoLivenessProbe NoMemoryLimits NoMemoryRequests NoPriorityClassName NotReadOnlyRootFilesystem NoReadinessProbe NotRunAsNonRoot]

添加自定义 NPD 检查规则

  • 修改 configmap
kubectl edit ConfigMap node-problem-detector-config -n kube-system 
  • 重启 NPD
kubectl rollout restart DaemonSet node-problem-detector -n kube-system

在 Kubernetes 中部署 Kubeeye

部署 Kubeeye

kubectl apply -f https://raw.githubusercontent.com/kubesphere/kubeeye/main/deploy/kubeeye.yaml
kubectl apply -f https://raw.githubusercontent.com/kubesphere/kubeeye/main/deploy/kubeeye_insights.yaml

查看 Kubeeye 巡检结果

kubectl get clusterinsight -o yaml
apiVersion: v1
items:
- apiVersion: kubeeye.kubesphere.io/v1alpha1
  kind: ClusterInsight
  metadata:
    name: clusterinsight-sample
    namespace: default
  spec:
    auditPeriod: 24h
  status:
    auditResults:
      auditResults:
      - resourcesType: Node
        resultInfos:
        - namespace: ""
          resourceInfos:
          - items:
            - level: waring
              message: KubeletHasNoSufficientMemory
              reason: kubelet has no sufficient memory available
            - level: waring
              message: KubeletHasNoSufficientPID
              reason: kubelet has no sufficient PID available
            - level: waring
              message: KubeletHasDiskPressure
              reason: kubelet has disk pressure
            name: kubeeyeNode

文档