Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flake: TestDebug/helm time out #5249

Closed
briandealwis opened this issue Jan 20, 2021 · 5 comments · Fixed by #5252
Closed

Flake: TestDebug/helm time out #5249

briandealwis opened this issue Jan 20, 2021 · 5 comments · Fixed by #5252

Comments

@briandealwis
Copy link
Member

Seeing periodic failures of TestDebug/helm. It looks like the actual debug deployment is in progress but 1min is not sufficient. It seems surprising that the last log Deployment message is at 10:34 but the kubectl get deploymens.app is at 11:23 — why did nothing else happen in the time.

/integration/TestDebug/helm
time="2021-01-20T05:08:51Z" level=info msg="[skaffold build --default-repo gcr.io/k8s-skaffold]"
time="2021-01-20T05:08:51Z" level=info msg="Ran in 3 minutes 59.509 seconds"
time="2021-01-20T05:10:23Z" level=info msg="Ran in 1 minute 32.35 seconds"
time="2021-01-20T05:10:23Z" level=info msg="Namespace: skaffold6blwb"
time="2021-01-20T05:10:23Z" level=info msg="[skaffold debug --namespace skaffold6blwb --default-repo gcr.io/k8s-skaffold]"
time="2021-01-20T05:10:23Z" level=info msg="Waiting for deployments [skaffold-helm] to stabilize"
time="2021-01-20T05:10:33Z" level=info msg="Deployment skaffold-helm: Generation 0/1, Replicas 0/1, Available 0/1"
time="2021-01-20T05:10:33Z" level=info msg="Deployment skaffold-helm: Generation 0/1, Replicas 0/1, Available 0/1"
time="2021-01-20T05:10:33Z" level=info msg="Deployment skaffold-helm: Generation 1/1, Replicas 0/1, Available 0/1"
time="2021-01-20T05:10:33Z" level=info msg="Deployment skaffold-helm: Generation 1/1, Replicas 1/1, Available 0/1"
time="2021-01-20T05:10:34Z" level=info msg="Deployment skaffold-helm: Generation 1/1, Replicas 1/1, Available 0/1"
Filesystem      Size  Used Avail Use% Mounted on
overlay          68G   25G   44G  36% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
shm              64M     0   64M   0% /dev/shm
/dev/sda1        68G   25G   44G  36% /root/.cache
tmpfs           798M  8.8M  789M   2% /run/docker.sock
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware
time="2021-01-20T05:11:23Z" level=warning msg="[kubectl -n skaffold6blwb get deployments.apps -oyaml]"
apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      meta.helm.sh/release-name: skaffold-helm
      meta.helm.sh/release-namespace: skaffold6blwb
    creationTimestamp: "2021-01-20T05:10:33Z"
    generation: 1
    labels:
      app: skaffold-helm
      app.kubernetes.io/managed-by: skaffold
      skaffold.dev/run-id: 62c94a15-0cd5-4811-b698-d5d1c2724b53
    managedFields:
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app: {}
        f:spec:
          f:progressDeadlineSeconds: {}
          f:replicas: {}
          f:revisionHistoryLimit: {}
          f:selector:
            f:matchLabels:
              .: {}
              f:app: {}
          f:strategy:
            f:rollingUpdate:
              .: {}
              f:maxSurge: {}
              f:maxUnavailable: {}
            f:type: {}
          f:template:
            f:metadata:
              f:annotations:
                .: {}
                f:debug.cloud.google.com/config: {}
              f:labels:
                .: {}
                f:app: {}
            f:spec:
              f:containers:
                k:{"name":"skaffold-helm"}:
                  .: {}
                  f:args: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:name: {}
                  f:ports:
                    .: {}
                    k:{"containerPort":56268,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:name: {}
                      f:protocol: {}
                  f:resources: {}
                  f:terminationMessagePath: {}
                  f:terminationMessagePolicy: {}
                  f:volumeMounts:
                    .: {}
                    k:{"mountPath":"/dbg"}:
                      .: {}
                      f:mountPath: {}
                      f:name: {}
              f:dnsPolicy: {}
              f:initContainers:
                .: {}
                k:{"name":"install-go-debug-support"}:
                  .: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:name: {}
                  f:resources: {}
                  f:terminationMessagePath: {}
                  f:terminationMessagePolicy: {}
                  f:volumeMounts:
                    .: {}
                    k:{"mountPath":"/dbg"}:
                      .: {}
                      f:mountPath: {}
                      f:name: {}
              f:restartPolicy: {}
              f:schedulerName: {}
              f:securityContext: {}
              f:terminationGracePeriodSeconds: {}
              f:volumes:
                .: {}
                k:{"name":"debugging-support-files"}:
                  .: {}
                  f:emptyDir: {}
                  f:name: {}
      manager: Go-http-client
      operation: Update
      time: "2021-01-20T05:10:33Z"
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:deployment.kubernetes.io/revision: {}
        f:status:
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Progressing"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:observedGeneration: {}
          f:replicas: {}
          f:unavailableReplicas: {}
          f:updatedReplicas: {}
      manager: k3s
      operation: Update
      time: "2021-01-20T05:10:33Z"
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:app.kubernetes.io/managed-by: {}
            f:skaffold.dev/run-id: {}
      manager: skaffold
      operation: Update
      time: "2021-01-20T05:10:33Z"
    name: skaffold-helm
    namespace: skaffold6blwb
    resourceVersion: "2152"
    selfLink: /apis/apps/v1/namespaces/skaffold6blwb/deployments/skaffold-helm
    uid: 6ad6f720-c250-4b60-9ce2-64e5baef66fe
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: skaffold-helm
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          debug.cloud.google.com/config: '{"skaffold-helm":{"artifact":"skaffold-helm","runtime":"go","ports":{"dlv":56268}}}'
        creationTimestamp: null
        labels:
          app: skaffold-helm
      spec:
        containers:
        - args:
          - /dbg/go/bin/dlv
          - exec
          - --headless
          - --continue
          - --accept-multiclient
          - --listen=:56268
          - --api-version=2
          - ./app
          image: gcr.io/k8s-skaffold/skaffold-helm:ae17b46825bbb479e56dee2c0b6cdf26851059e76bf276953a603fa7f4ac387d
          imagePullPolicy: IfNotPresent
          name: skaffold-helm
          ports:
          - containerPort: 56268
            name: dlv
            protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dbg
            name: debugging-support-files
        dnsPolicy: ClusterFirst
        initContainers:
        - image: gcr.io/k8s-skaffold/skaffold-debug-support/go
          imagePullPolicy: Always
          name: install-go-debug-support
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dbg
            name: debugging-support-files
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
        volumes:
        - emptyDir: {}
          name: debugging-support-files
  status:
    conditions:
    - lastTransitionTime: "2021-01-20T05:10:33Z"
      lastUpdateTime: "2021-01-20T05:10:33Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2021-01-20T05:10:33Z"
      lastUpdateTime: "2021-01-20T05:10:33Z"
      message: ReplicaSet "skaffold-helm-779c78f5ff" is progressing.
      reason: ReplicaSetUpdated
      status: "True"
      type: Progressing
    observedGeneration: 1
    replicas: 1
    unavailableReplicas: 1
    updatedReplicas: 1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
time="2021-01-20T05:11:23Z" level=warning msg="[kubectl -n skaffold6blwb get pods -oyaml]"
apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      debug.cloud.google.com/config: '{"skaffold-helm":{"artifact":"skaffold-helm","runtime":"go","ports":{"dlv":56268}}}'
    creationTimestamp: "2021-01-20T05:10:33Z"
    generateName: skaffold-helm-779c78f5ff-
    labels:
      app: skaffold-helm
      pod-template-hash: 779c78f5ff
    managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:debug.cloud.google.com/config: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:app: {}
            f:pod-template-hash: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"eb63642b-214a-4907-9e9a-cd3c2032a32c"}:
              .: {}
              f:apiVersion: {}
              f:blockOwnerDeletion: {}
              f:controller: {}
              f:kind: {}
              f:name: {}
              f:uid: {}
        f:spec:
          f:containers:
            k:{"name":"skaffold-helm"}:
              .: {}
              f:args: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:name: {}
              f:ports:
                .: {}
                k:{"containerPort":56268,"protocol":"TCP"}:
                  .: {}
                  f:containerPort: {}
                  f:name: {}
                  f:protocol: {}
              f:resources: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
              f:volumeMounts:
                .: {}
                k:{"mountPath":"/dbg"}:
                  .: {}
                  f:mountPath: {}
                  f:name: {}
          f:dnsPolicy: {}
          f:enableServiceLinks: {}
          f:initContainers:
            .: {}
            k:{"name":"install-go-debug-support"}:
              .: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:name: {}
              f:resources: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
              f:volumeMounts:
                .: {}
                k:{"mountPath":"/dbg"}:
                  .: {}
                  f:mountPath: {}
                  f:name: {}
          f:restartPolicy: {}
          f:schedulerName: {}
          f:securityContext: {}
          f:terminationGracePeriodSeconds: {}
          f:volumes:
            .: {}
            k:{"name":"debugging-support-files"}:
              .: {}
              f:emptyDir: {}
              f:name: {}
        f:status:
          f:conditions:
            k:{"type":"ContainersReady"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Initialized"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:status: {}
              f:type: {}
            k:{"type":"Ready"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:containerStatuses: {}
          f:hostIP: {}
          f:initContainerStatuses: {}
          f:phase: {}
          f:podIP: {}
          f:podIPs:
            .: {}
            k:{"ip":"10.42.0.26"}:
              .: {}
              f:ip: {}
          f:startTime: {}
      manager: k3s
      operation: Update
      time: "2021-01-20T05:11:21Z"
    name: skaffold-helm-779c78f5ff-8wcqd
    namespace: skaffold6blwb
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: skaffold-helm-779c78f5ff
      uid: eb63642b-214a-4907-9e9a-cd3c2032a32c
    resourceVersion: "2212"
    selfLink: /api/v1/namespaces/skaffold6blwb/pods/skaffold-helm-779c78f5ff-8wcqd
    uid: 23791d18-fa75-47cb-bdae-84b2ba81ccc7
  spec:
    containers:
    - args:
      - /dbg/go/bin/dlv
      - exec
      - --headless
      - --continue
      - --accept-multiclient
      - --listen=:56268
      - --api-version=2
      - ./app
      image: gcr.io/k8s-skaffold/skaffold-helm:ae17b46825bbb479e56dee2c0b6cdf26851059e76bf276953a603fa7f4ac387d
      imagePullPolicy: IfNotPresent
      name: skaffold-helm
      ports:
      - containerPort: 56268
        name: dlv
        protocol: TCP
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /dbg
        name: debugging-support-files
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: default-token-fnp4h
        readOnly: true
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    initContainers:
    - image: gcr.io/k8s-skaffold/skaffold-debug-support/go
      imagePullPolicy: Always
      name: install-go-debug-support
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /dbg
        name: debugging-support-files
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: default-token-fnp4h
        readOnly: true
    nodeName: k3d-k3s-default-server-0
    preemptionPolicy: PreemptLowerPriority
    priority: 0
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: default
    serviceAccountName: default
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - emptyDir: {}
      name: debugging-support-files
    - name: default-token-fnp4h
      secret:
        defaultMode: 420
        secretName: default-token-fnp4h
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:35Z"
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:33Z"
      message: 'containers with unready status: [skaffold-helm]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:33Z"
      message: 'containers with unready status: [skaffold-helm]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:33Z"
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: containerd://160189460e1cde3b32bec6a40303854cc0fba9c4e2c0e5f42a98455631b5ff03
      image: gcr.io/k8s-skaffold/skaffold-helm:ae17b46825bbb479e56dee2c0b6cdf26851059e76bf276953a603fa7f4ac387d
      imageID: sha256:ae17b46825bbb479e56dee2c0b6cdf26851059e76bf276953a603fa7f4ac387d
      lastState:
        terminated:
          containerID: containerd://695b1e6b19c107a81aa549efe50ddef72a2e28dd64641bae52f5ae688ac93595
          exitCode: 1
          finishedAt: "2021-01-20T05:10:52Z"
          reason: Error
          startedAt: "2021-01-20T05:10:52Z"
      name: skaffold-helm
      ready: false
      restartCount: 3
      started: false
      state:
        terminated:
          containerID: containerd://160189460e1cde3b32bec6a40303854cc0fba9c4e2c0e5f42a98455631b5ff03
          exitCode: 1
          finishedAt: "2021-01-20T05:11:20Z"
          reason: Error
          startedAt: "2021-01-20T05:11:20Z"
    hostIP: 172.18.0.2
    initContainerStatuses:
    - containerID: containerd://e1ced7cafad3b2773878a7b0a7cb52cf8db4d8de4cb63045d5bf83bed82f51ca
      image: gcr.io/k8s-skaffold/skaffold-debug-support/go:latest
      imageID: gcr.io/k8s-skaffold/skaffold-debug-support/go@sha256:fed1fa9257bff44f45b66d13883f180550e0b5844258641727e37c600407a63f
      lastState: {}
      name: install-go-debug-support
      ready: true
      restartCount: 0
      state:
        terminated:
          containerID: containerd://e1ced7cafad3b2773878a7b0a7cb52cf8db4d8de4cb63045d5bf83bed82f51ca
          exitCode: 0
          finishedAt: "2021-01-20T05:10:34Z"
          reason: Completed
          startedAt: "2021-01-20T05:10:34Z"
    phase: Running
    podIP: 10.42.0.26
    podIPs:
    - ip: 10.42.0.26
    qosClass: BestEffort
    startTime: "2021-01-20T05:10:33Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
    debug_test.go:105: Timed out waiting for deployments [skaffold-helm] to stabilize in namespace skaffold6blwb
    helper.go:184: Skaffold log:
         Listing files to watch...
        >  - skaffold-helm
        > Generating tags...
        >  - skaffold-helm -> gcr.io/k8s-skaffold/skaffold-helm:387049b
        > Checking cache...
        >  - skaffold-helm: Not found. Building
        > Found [k3d-k3s-default] context, using local docker daemon.
        > Building [skaffold-helm]...
        > Sending build context to Docker daemon  3.072kB
        > Step 1/8 : FROM golang:1.12.9-alpine3.10 as builder
        >  ---> e0d646523991
        > Step 2/8 : COPY main.go .
        >  ---> Using cache
        >  ---> 8f3de3646262
        > Step 3/8 : ARG SKAFFOLD_GO_GCFLAGS
        >  ---> Using cache
        >  ---> ecfdc43eaa54
        > Step 4/8 : RUN go build -gcflags="${SKAFFOLD_GO_GCFLAGS}" -o /app main.go
        >  ---> Running in 86c97c1b2fc5
        >  ---> d9e7b56cad63
        > Step 5/8 : FROM alpine:3.10
        >  ---> be4e4bea2c2e
        > Step 6/8 : ENV GOTRACEBACK=single
        >  ---> Using cache
        >  ---> 3b643d72c9ed
        > Step 7/8 : CMD ["./app"]
        >  ---> Using cache
        >  ---> cd597cf41010
        > Step 8/8 : COPY --from=builder /app .
        >  ---> ae17b46825bb
        > Successfully built ae17b46825bb
        > Successfully tagged gcr.io/k8s-skaffold/skaffold-helm:387049b
        > Tags used in deployment:
        >  - skaffold-helm -> gcr.io/k8s-skaffold/skaffold-helm:ae17b46825bbb479e56dee2c0b6cdf26851059e76bf276953a603fa7f4ac387d
        > Loading images into k3d cluster nodes...
        >  - gcr.io/k8s-skaffold/skaffold-helm:ae17b46825bbb479e56dee2c0b6cdf26851059e76bf276953a603fa7f4ac387d -> Loaded
        > Images loaded in 4.26 seconds
        > Starting deploy...
        > Helm release skaffold-helm not installed. Installing...
        > NAME: skaffold-helm
        > LAST DEPLOYED: Wed Jan 20 05:10:33 2021
        > NAMESPACE: skaffold6blwb
        > STATUS: deployed
        > REVISION: 1
        > TEST SUITE: None
        > Waiting for deployments to stabilize...
        >  - skaffold6blwb:deployment/skaffold-helm: waiting for rollout to finish: 0 of 1 updated replicas are available...
        > 
    --- FAIL: TestDebug/helm (152.57s)
@briandealwis
Copy link
Member Author

Ah TestDebug uses our GetPod() and GetDeployment() helpers to get the corresponding objects. GetClient() eventually called out WaitForPodsInPhase() which waits up to 5 minutes. GetDeployment() on the other hand calls into WaitForDeploymentsToStablize() which waits up to 60 seconds.

The helm test occasionally fails because doesn't have any pods, and apparently 60 seconds isn't sufficient for the deployment to complete.

@briandealwis
Copy link
Member Author

Hurray, the logs finally reveal the problem:

time="2021-01-21T03:25:21Z" level=warning msg="[kubectl -n skaffoldrps5k logs deployment.app/skaffold-helm]"
API server listening at: [::]:56268
Version of Go is too old for this version of Delve (minimum supported version 1.13, suppress this error with --check-go-version=false)

(Why doesn't this happen locally!?)

@briandealwis
Copy link
Member Author

Our integration tests use a helper function WaitForPodsReady() to ensure pods are in a running state:

// WaitForPodsReady waits for a list of pods to become ready.
func (k *NSKubernetesClient) WaitForPodsReady(podNames ...string) {
k.WaitForPodsInPhase(v1.PodRunning, podNames...)
}

This checks the pod phase, but doesn't check the pod readiness. You can see in the skaffold-helm-779c78f5ff-8wcqd pod trace above that phase: Running but the pod Ready condition is False (reorganized for clarity):

  status:
    phase: Running
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:35Z"
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:33Z"
      message: 'containers with unready status: [skaffold-helm]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2021-01-20T05:10:33Z"
      message: 'containers with unready status: [skaffold-helm]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady

@briandealwis
Copy link
Member Author

Funnily enough, we have a few calls to WaitForPodsReady() with no arguments, which doesn't actually wait:

func (k *NSKubernetesClient) WaitForPodsInPhase(expectedPhase v1.PodPhase, podNames ...string) {
if len(podNames) == 0 {
return
}

@briandealwis
Copy link
Member Author

Closing as this last fix seems to have finally done the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant