Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeJobCompletion Prometheus alert for desheduler jobs #432

Closed
KR411-prog opened this issue Oct 28, 2020 · 13 comments · Fixed by #444
Closed

KubeJobCompletion Prometheus alert for desheduler jobs #432

KR411-prog opened this issue Oct 28, 2020 · 13 comments · Fixed by #444
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@KR411-prog
Copy link

KR411-prog commented Oct 28, 2020

We are receiving KubeJobCompletion Prometheus alert for desheduler jobs with the below alert message,
kube-system/descheduler-1603424400 is taking more than 12 hours to complete.

The descheduler values file config is as shown below,

fullnameOverride: descheduler
nameOverride: descheduler
deschedulerPolicy:
  nodeSelector: kops.k8s.io/instancegroup=nodes
  evict-local-storage-pods: false
  maxNoOfPodsToEvictPerNode: 4
  strategies:
    RemoveDuplicates:
      enabled: true
    RemovePodsViolatingInterPodAntiAffinity:
      enabled: true
    RemovePodsViolatingNodeAffinity:
      enabled: true
    LowNodeUtilization:
      enabled: true
      params:
        numberOfNodes: 2
        nodeResourceUtilizationThresholds:
          # node is underutilized if all 3 metrics are below the threshold
          thresholds:
            "cpu" : 50
            "memory": 30
            "pods": 30
          # node is overutilized is any of these 3 metrics is above the target threshold
          targetThresholds:
            "cpu" : 80
            "memory": 70
            "pods": 50
    RemovePodsHavingTooManyRestarts:
      enabled: true
      params:
        podsHavingTooManyRestarts:
          podRestartThreshold: 100
          includingInitContainers: true
    PodLifeTime:
      enabled: false
rbac:
  create: true
schedule: "*/2 * * * *"

I am not sure if there is a way in the config to tune to delete the job if it takes more than 30 mins. I dont find that tuning configuration in the values file in this helm chart.
chart: descheduler/descheduler-helm-chart
version: "0.19.0"

Any help on how to avoid getting this alert? Is there any tuning that can be done in descheduler config?

@seanmalloy
Copy link
Member

/triage support

@k8s-ci-robot
Copy link
Contributor

@seanmalloy: The label(s) triage/support cannot be applied, because the repository doesn't have them

In response to this:

/triage support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@seanmalloy
Copy link
Member

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Oct 29, 2020
@seanmalloy
Copy link
Member

@KR411-prog thanks for opening this issue. Please provide the below details and we will try to help.

Please provide the full CronJob yaml with any sensitive info redacted.

kubectl get cronjob -n mynamespace mycronjob -o yaml

What k8s version are you using?

Please provide the pod log for long running descheduler CronJob pod with any sensitive info redacted.

kubectl logs -n mynamespace mypod

@seanmalloy
Copy link
Member

@KR411-prog it would also be helpful to know if the descheduler pod is maxing out it's CPU/Memory requests or limits. Also, roughly how many nodes and pods are in the cluster?

Thanks!

@KR411-prog
Copy link
Author

Here is the cronjob config,

apiVersion: v1
items:
- apiVersion: batch/v1beta1
  kind: CronJob
  metadata:
    annotations:
      meta.helm.sh/release-name: descheduler
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2020-09-17T22:12:55Z"
    labels:
      app.kubernetes.io/instance: descheduler
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: descheduler
      app.kubernetes.io/version: 0.19.0
      helm.sh/chart: descheduler-helm-chart-0.19.0
    name: descheduler
    namespace: kube-system
    resourceVersion: "56889398"
    selfLink: /apis/batch/v1beta1/namespaces/kube-system/cronjobs/descheduler
    uid: 8ca58fd8-4988-41fb-910a-d2f0cc7e7e9c
  spec:
    concurrencyPolicy: Forbid
    failedJobsHistoryLimit: 1
    jobTemplate:
      metadata:
        creationTimestamp: null
      spec:
        template:
          metadata:
            annotations:
              checksum/config: ea19993a2d8da1e8b8774c541cfb67debbdb62ae9505a4bc4ca238320c271805
            creationTimestamp: null
            labels:
              app.kubernetes.io/instance: descheduler
              app.kubernetes.io/name: descheduler
            name: descheduler
          spec:
            containers:
            - args:
              - --policy-config-file
              - /policy-dir/policy.yaml
              - --v
              - "3"
              command:
              - /bin/descheduler
              image: k8s.gcr.io/descheduler/descheduler:v0.19.0
              imagePullPolicy: IfNotPresent
              name: descheduler-helm-chart
              resources: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /policy-dir
                name: policy-volume
            dnsPolicy: ClusterFirst
            priorityClassName: system-cluster-critical
            restartPolicy: Never
            schedulerName: default-scheduler
            securityContext: {}
            serviceAccount: descheduler
            serviceAccountName: descheduler
            terminationGracePeriodSeconds: 30
            volumes:
            - configMap:
                defaultMode: 420
                name: descheduler
              name: policy-volume
    schedule: '*/10 * * * *'
    successfulJobsHistoryLimit: 3
    suspend: false
  status:
    lastScheduleTime: "2020-10-24T00:40:00Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I am unable to take logs because I dont have this issue today.As soon as I get this issue again,I can share the logs here.
163 pods are in this cluster
6 worker nodes
Its an EKS cluster - Kubernetes 1.15

@seanmalloy
Copy link
Member

@KR411-prog thanks for providing the additional details. One thing I see is that the descheduler container requests/limits are not set. This might be a bug in the helm chart, but I did not dig into the helm chart to see if there is an option to set that.

Without the logs it will be difficult to determine root cause. Please add the descheduler pod logs if you see this happen again.

Thanks!

@KR411-prog
Copy link
Author

Today we got same issue..

kubectl get jobs -n kube-system
NAME                     COMPLETIONS   DURATION   AGE
descheduler-1604088720   1/1           37s        6m17s
descheduler-1604088840   1/1           37s        4m26s
descheduler-1604088960   1/1           36s        2m25s
descheduler-1604089080   0/1           24s        24s

We received KubeJobCompletion alert for descheduler-1604089080 job. But the logs in the pod had no error, but within 1 min or so the pod and job was deleted automicatically.

Now I see only new jobs,

kubectl get jobs -n kube-system
NAME                     COMPLETIONS   DURATION   AGE
descheduler-1604092080   1/1           36s        4m59s
descheduler-1604092200   1/1           36s        2m58s
descheduler-1604092320   1/1           37s        58s

So the job which showed problem didnt have any failed status in it,

kubectl describe job descheduler-1604089080 -n kube-system


Name:           descheduler-1604089080
Namespace:      kube-system
Selector:       controller-uid=dd8c06ea-c2cd-42de-9eef-06af31e74d40
Labels:         app.kubernetes.io/instance=descheduler
                app.kubernetes.io/name=descheduler
                controller-uid=dd8c06ea-c2cd-42de-9eef-06af31e74d40
                job-name=descheduler-1604089080
Annotations:    <none>
Controlled By:  CronJob/descheduler
Parallelism:    1
Completions:    1
Start Time:     Fri, 30 Oct 2020 13:18:02 -0700
Completed At:   Fri, 30 Oct 2020 13:18:38 -0700
Duration:       36s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/instance=descheduler
                    app.kubernetes.io/name=descheduler
                    controller-uid=dd8c06ea-c2cd-42de-9eef-06af31e74d40
                    job-name=descheduler-1604089080
  Annotations:      checksum/config: 93edabe3808159d55ef01771bbe791b880656fd7f010a59731302c452628f9cc
  Service Account:  descheduler
  Containers:
   descheduler-helm-chart:
    Image:      k8s.gcr.io/descheduler/descheduler:v0.19.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/descheduler
    Args:
      --policy-config-file
      /policy-dir/policy.yaml
      --v
      3
    Environment:  <none>
    Mounts:
      /policy-dir from policy-volume (rw)
  Volumes:
   policy-volume:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               descheduler
    Optional:           false
  Priority Class Name:  system-cluster-critical
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  2m19s  job-controller  Created pod: descheduler-1604089080-j7pl6

By checking the cronjob manifest file, I see ConcurrencyPolicy set as Forbid. Does adding "startingDeadlineSeconds: 10" helps in improving cronjob behaviour?

@KR411-prog
Copy link
Author

There is another issue today.
descheduler Job showed age as 98m.
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed

Pod logs showed an error,

  Warning  FailedCreatePodSandBox  20m (x4308 over 100m)  kubelet  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "48779b946f6a932a7d8f4271ac75f4e06da5bbb76a13f708e5a0c33bed5c3f65" network for pod "descheduler-1604088600-m522f": NetworkPlugin cni failed to set up pod "descheduler-1604088600-m522f_kube-system" network: add cmd: failed to assign an IP address to container
  Normal   SandboxChanged          12s (x5396 over 100m)  kubelet  Pod sandbox changed, it will be killed and re-created.

I think by setting activeDeadlineSeconds, this issue can be resolved. But I dont find this field "activeDeadlineSeconds" in values file of descheduler chart.

@KR411-prog
Copy link
Author

I found the below error in today's issue,

Events:
  Type     Reason            Age    From            Message
  ----     ------            ----   ----            -------
  Normal   SuccessfulCreate  7m11s  job-controller  Created pod: time-limited-rbac-1604522700-qqgct
  Normal   SuccessfulDelete  2m31s  job-controller  Deleted pod: time-limited-rbac-1604522700-qqgct
  Warning  DeadlineExceeded  2m31s  job-controller  Job was active longer than specified deadline

Pods are deleted but the job was still in failed status with the error "Job was active longer than specified deadline".
Is there anything that can be done to tune descheduler config to fix this issue?

@damemi
Copy link
Contributor

damemi commented Nov 13, 2020

I see you are running descheduler v0.19, and you also mentioned your cluster is k8s 1.15. Please note that we currently only support k8s to descheduler version N-3 (see https://github.com/kubernetes-sigs/descheduler/#compatibility-matrix)

I'm not sure if that will relate to your problem, but this seems like more of an issue with the cronjob (though any logs you could get from the descheduler pod would be the best way to tell, possibly at a higher log level like v=4). Do you ever have similar problems running cron jobs for other tools on your cluster?

If you can't resolve that, another option is running the descheduler as a regular deployment with the --descheduling-interval flag set

@seanmalloy
Copy link
Member

In my opinion this is a problem in the descheduler helm chart and also the k8s manifests found in the top level kubernetes directory of this repo. There are multiple problems.

  1. CPU and Memory requests/limits are not set
  2. .spec.startingDeadlineSeconds is not configurable for the CronJob

Item 1 from above is a bug in my opinion. I supposed item 2 would be a feature enhancement request.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 18, 2020
@seanmalloy
Copy link
Member

I think I can get the helm chart and k8s yaml manifests updated to hopefully mitigate this issue.

/assign
/remove-kind support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants