Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

Workloads are not scheduled to one of two worker nodes - race condition at setup? #3531

Closed
DonMartin76 opened this issue Jun 24, 2020 · 11 comments
Labels
bug Something isn't working stale

Comments

@DonMartin76
Copy link

DonMartin76 commented Jun 24, 2020

Describe the bug

I saw the following behaviour today which I have never seen before: In a pretty simple 2+1+1 node cluster (2 workers, 1 helper for utilities, 1 master), Kubernetes would not schedule any normal pods to the second worker node; this resulted in the first worker node being swamped and soon rejecting further pods due to CPU limitations.

Please note that we are using this setup in production since very long ago, at least 18 months, and this I have never seen. We quite recently updated to the latest aks-engine version, and we are running Kubernetes 1.18.3 with it.

What I noticed when comparing the output of kubectl describe node for both nodes is that one of the nodes (the one where nothing gets scheduled) is missing the following annotation:

Annotations:        csi.volume.kubernetes.io/nodeid: {"secrets-store.csi.k8s.io":"k8s-workerdev36-11495506-0"}

On a functioning cluster, both worker nodes have this annotation. Other than this, I cannot find anything which looks odd in the node description.

Here's also an excerpt from the logs of the kube-scheduler (directly from the docker container on the master node):

I0624 06:57:54.254388       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/kube-scheduler...
I0624 06:57:54.520609       1 node_tree.go:86] Added node "k8s-master-11495506-0" in group "westeurope:\x00:0" to NodeTree
I0624 06:57:54.520806       1 node_tree.go:86] Added node "k8s-helperdev36-11495506-0" in group "westeurope:\x00:0" to NodeTree
I0624 06:57:54.520914       1 node_tree.go:86] Added node "k8s-workerdev36-11495506-1" in group "westeurope:\x00:0" to NodeTree
E0624 06:57:54.658011       1 leaderelection.go:324] error initially creating leader election record: namespaces "kube-system" not found
I0624 06:57:58.426473       1 leaderelection.go:252] successfully acquired lease kube-system/kube-scheduler
I0624 06:58:13.257430       1 scheduler.go:731] pod kube-system/coredns-autoscaler-5c7db64899-7mhhw is bound successfully on node "k8s-workerdev36-11495506-1", 3 nodes evaluated, 3 nodes were found feasible.
I0624 06:58:13.258122       1 scheduler.go:731] pod kube-system/coredns-84679d997-dbf57 is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 3 nodes were found feasible.
I0624 06:58:15.267024       1 scheduler.go:731] pod kube-system/blobfuse-flexvol-installer-5jzg9 is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.312988       1 scheduler.go:731] pod kube-system/blobfuse-flexvol-installer-xh4q2 is bound successfully on node "k8s-workerdev36-11495506-1", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.341666       1 scheduler.go:731] pod kube-system/azure-ip-masq-agent-st97b is bound successfully on node "k8s-master-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.389081       1 scheduler.go:731] pod kube-system/azure-ip-masq-agent-fq9wn is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.389477       1 scheduler.go:731] pod kube-system/azure-ip-masq-agent-kvx9d is bound successfully on node "k8s-workerdev36-11495506-1", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.509714       1 scheduler.go:731] pod kube-system/kube-proxy-n5wq7 is bound successfully on node "k8s-master-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.563087       1 scheduler.go:731] pod kube-system/kube-proxy-tf6gt is bound successfully on node "k8s-workerdev36-11495506-1", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:15.568209       1 scheduler.go:731] pod kube-system/kube-proxy-2hq2s is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:16.217523       1 scheduler.go:731] pod kube-system/metrics-server-bb7db87bc-5pvr6 is bound successfully on node "k8s-workerdev36-11495506-1", 3 nodes evaluated, 2 nodes were found feasible.
I0624 06:58:17.537343       1 node_tree.go:86] Added node "k8s-workerdev36-11495506-0" in group "westeurope:\x00:1" to NodeTree
I0624 06:58:17.653975       1 scheduler.go:731] pod kube-system/azure-ip-masq-agent-n7xv4 is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 2 nodes were found feasible.
I0624 06:58:17.654442       1 scheduler.go:731] pod kube-system/kube-proxy-ftmxh is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 2 nodes were found feasible.
I0624 06:58:17.896742       1 scheduler.go:731] pod kube-system/tiller-deploy-7fc97d6ffd-c5hvv is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:27.730380       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:27.919520       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:27.951280       1 scheduler.go:731] pod kube-system/azure-cni-networkmonitor-5b755 is bound successfully on node "k8s-master-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:27.951536       1 scheduler.go:731] pod kube-system/blobfuse-flexvol-installer-hv28k is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 2 nodes were found feasible.
I0624 06:58:27.951600       1 scheduler.go:731] pod kube-system/azure-cni-networkmonitor-4tqbm is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:27.951646       1 scheduler.go:731] pod kube-system/azure-cni-networkmonitor-h5742 is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 2 nodes were found feasible.
I0624 06:58:28.445052       1 scheduler.go:731] pod kubernetes-dashboard/kubernetes-dashboard-6d6c6db487-cvbks is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 3 nodes were found feasible.
I0624 06:58:28.709510       1 scheduler.go:731] pod kubernetes-dashboard/dashboard-metrics-scraper-7bdfbb4477-tw5c7 is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 3 nodes were found feasible.
I0624 06:58:29.270731       1 scheduler.go:731] pod kube-system/csi-secrets-store-52r7w is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:29.271368       1 scheduler.go:731] pod kube-system/csi-secrets-store-provider-azure-5df99 is bound successfully on node "k8s-helperdev36-11495506-0", 3 nodes evaluated, 1 nodes were found feasible.
I0624 06:58:29.385840       1 scheduler.go:731] pod kube-system/csi-secrets-store-fmrmc is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 2 nodes were found feasible.
I0624 06:58:29.386239       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting
E0624 06:58:29.428593       1 factory.go:503] pod: kube-system/csi-secrets-store-4tmv2 is already present in the active queue
I0624 06:58:29.444003       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:29.444824       1 scheduler.go:731] pod kube-system/csi-secrets-store-provider-azure-scbzg is bound successfully on node "k8s-workerdev36-11495506-0", 4 nodes evaluated, 2 nodes were found feasible.
E0624 06:58:29.468423       1 factory.go:503] pod: kube-system/csi-secrets-store-provider-azure-zdkwx is already present in the active queue
I0624 06:58:29.495240       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting
I0624 06:58:29.503893       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:30.460870       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:31.459366       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting
I0624 06:58:31.467507       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:35.812740       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:35.822072       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting
I0624 06:58:35.836833       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:40.460673       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:58:40.467030       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting
I0624 06:58:44.461240       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:59:14.416900       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:59:14.422558       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:59:14.433009       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting
I0624 06:59:53.419188       1 factory.go:462] Unable to schedule kube-system/azure-cni-networkmonitor-xf9kz: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:59:53.454104       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting
I0624 06:59:53.493036       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 node(s) didn't match node selector, 2 node(s) didn't have free ports for the requested pod ports.; waiting

This may look like the missing annotation may rather be an effect of the underlying problem, and not the problem as such.

Steps To Reproduce

The tricky part is that I cannot reproduce this anymore. The next cluster which was created worked nicely again - and it had the correct annotations. I do not know whether those annotations are related, but it's the only thing I could find which differed between the nodes.

This is the API model we used:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.18.3",
      "kubernetesConfig": {
        "kubeletConfig": {
          "--max-pods": "50"
        },
        "addons": [
          {
            "name": "tiller",
            "enabled": true,
            "containers": [
              {
                "name": "tiller",
                "image": "gcrio.azureedge.net/kubernetes-helm/tiller:v2.16.1",
                "cpuRequests": "50m",
                "memoryRequests": "200Mi",
                "cpuLimits": "200m",
                "memoryLimits": "1024Mi"
              }
            ]
          },
          {
            "name": "kubernetes-dashboard",
            "enabled": true,
            "containers": [
              {
                "name": "kubernetes-dashboard",
                "cpuRequests": "50m",
                "memoryRequests": "256Mi",
                "cpuLimits": "50m",
                "memoryLimits": "512Mi"
              }
            ]
          }
        ]
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "dev36-redacted",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "workerdev36",
        "count": 2,
        "vmSize": "Standard_D2s_v3",
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "customNodeLabels": {
          "stack": "dev36",
          "instancetype": "worker"
        }
      },
      {
        "name": "helperdev36",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "customNodeLabels": {
          "stack": "dev36",
          "instancetype": "helper"
        }
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa <redacted>"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "<redacted>",
      "secret": "<redacted>"
    }
  }
}

Expected behavior

All nodes in the "worker" node pool should get workloads scheduled.

AKS Engine version

0.52.0

Kubernetes version

1.18.3

Additional context

To me, this looks like it's some kind of race condition at provisioning; it does not seem to happen often, but now it did, and it makes the cluster behave weirdly.

@DonMartin76 DonMartin76 added the bug Something isn't working label Jun 24, 2020
@chewong
Copy link

chewong commented Jun 24, 2020

Hi @DonMartin76. It looks like node labels aren't applied correctly, causing didn't match node selector error in the scheduler. Do you still have the cluster around? if so, could you provide the output of kubectl describe nodes?

@DonMartin76
Copy link
Author

I do not have the cluster around, but I do have the kubectl describe node output still:

node 0:

$ kc describe node k8s-workerdev36-11495506-0
Name:               k8s-workerdev36-11495506-0
Roles:              agent
Labels:             agentpool=workerdev36
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_D2s_v3
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=westeurope
                    failure-domain.beta.kubernetes.io/zone=1
                    instancetype=worker
                    kubernetes.azure.com/cluster=redacted-dev36-k8s
                    kubernetes.azure.com/role=agent
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k8s-workerdev36-11495506-0
                    kubernetes.io/os=linux
                    kubernetes.io/role=agent
                    node-role.kubernetes.io/agent=
                    node.kubernetes.io/instance-type=Standard_D2s_v3
                    stack=dev36
                    storageprofile=managed
                    storagetier=Premium_LRS
                    topology.kubernetes.io/region=westeurope
                    topology.kubernetes.io/zone=1
Annotations:        csi.volume.kubernetes.io/nodeid: {"secrets-store.csi.k8s.io":"k8s-workerdev36-11495506-0"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 24 Jun 2020 08:58:17 +0200
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 24 Jun 2020 10:33:46 +0200   Wed, 24 Jun 2020 08:58:17 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 24 Jun 2020 10:33:46 +0200   Wed, 24 Jun 2020 08:58:17 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 24 Jun 2020 10:33:46 +0200   Wed, 24 Jun 2020 08:58:17 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 24 Jun 2020 10:33:46 +0200   Wed, 24 Jun 2020 08:58:27 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    k8s-workerdev36-11495506-0
  InternalIP:  10.240.0.55
Capacity:
 attachable-volumes-azure-disk:  4
 cpu:                            2
 ephemeral-storage:              30309264Ki
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         8125684Ki
 pods:                           50
Allocatable:
 attachable-volumes-azure-disk:  4
 cpu:                            2
 ephemeral-storage:              27933017657
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         7357684Ki
 pods:                           50
System Info:
 Machine ID:                 xxxxxxxx
 System UUID:                xxxx-xxxx
 Boot ID:                    xxxx-xxxx
 Kernel Version:             5.3.0-1022-azure
 OS Image:                   Ubuntu 18.04.4 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://3.0.12+azure
 Kubelet Version:            v1.18.3
 Kube-Proxy Version:         v1.18.3

node 1:

Name:               k8s-workerdev36-11495506-1
Roles:              agent
Labels:             agentpool=workerdev36
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_D2s_v3
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=westeurope
                    failure-domain.beta.kubernetes.io/zone=0
                    instancetype=worker
                    kubernetes.azure.com/cluster=redacted-dev36-k8s
                    kubernetes.azure.com/role=agent
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k8s-workerdev36-11495506-1
                    kubernetes.io/os=linux
                    kubernetes.io/role=agent
                    node-role.kubernetes.io/agent=
                    node.kubernetes.io/instance-type=Standard_D2s_v3
                    stack=dev36
                    storageprofile=managed
                    storagetier=Premium_LRS
                    topology.kubernetes.io/region=westeurope
                    topology.kubernetes.io/zone=0
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 24 Jun 2020 08:57:54 +0200
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 24 Jun 2020 10:33:12 +0200   Wed, 24 Jun 2020 10:23:00 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 24 Jun 2020 10:33:12 +0200   Wed, 24 Jun 2020 10:23:00 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 24 Jun 2020 10:33:12 +0200   Wed, 24 Jun 2020 10:23:00 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 24 Jun 2020 10:33:12 +0200   Wed, 24 Jun 2020 10:23:10 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    k8s-workerdev36-11495506-1
  InternalIP:  10.240.0.4
Capacity:
 attachable-volumes-azure-disk:  4
 cpu:                            2
 ephemeral-storage:              30309264Ki
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         8125684Ki
 pods:                           50
Allocatable:
 attachable-volumes-azure-disk:  4
 cpu:                            2
 ephemeral-storage:              27933017657
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         7357684Ki
 pods:                           50
System Info:
 Machine ID:                 xxxxxxxx
 System UUID:                xxxx-xxxx
 Boot ID:                    xxxx-xxxx
 Kernel Version:             5.3.0-1022-azure
 OS Image:                   Ubuntu 18.04.4 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://3.0.12+azure
 Kubelet Version:            v1.18.3
 Kube-Proxy Version:         v1.18.3

@chewong
Copy link

chewong commented Jun 24, 2020

I wasn't able to repro it as well. @aramase will the lack of Secret Store CSI annotations affect pod scheduling?

@aramase
Copy link
Member

aramase commented Jun 24, 2020

No, the annotation doesn't affect pod scheduling. In fact, that annotation csi.volume.kubernetes.io/nodeid: {"secrets-store.csi.k8s.io":"k8s-workerdev36-11495506-0"} is set when the CSI driver starts running on the node. The node-driver-registrar component of the CSI driver will register the driver with kubelet and then annotation is set.

@Jabb0
Copy link

Jabb0 commented Jun 30, 2020

I am a collegue of @DonMartin76. One thing I noticed is that the faulty cluster has the following line in its logs:

I0624 06:57:54.520806       1 node_tree.go:86] Added node "k8s-helperdev36-11495506-0" in group "westeurope:\x00:0" to NodeTree
I0624 06:57:54.520914       1 node_tree.go:86] Added node "k8s-workerdev36-11495506-1" in group "westeurope:\x00:0" to NodeTree
E0624 06:57:54.658011       1 leaderelection.go:324] error initially creating leader election record: namespaces "kube-system" not found
I0624 06:57:58.426473       1 leaderelection.go:252] successfully acquired lease kube-system/kube-scheduler
...
I0624 06:58:17.537343       1 node_tree.go:86] Added node "k8s-workerdev36-11495506-0" in group "westeurope:\x00:1" to NodeTree

while the working cluster looks like this:

E0624 08:38:28.436863       1 leaderelection.go:324] error initially creating leader election record: namespaces "kube-system" not found
I0624 08:38:28.458355       1 node_tree.go:86] Added node "k8s-helperdev37-<number>-0" in group "westeurope:\x00:0" to NodeTree
I0624 08:38:28.459601       1 node_tree.go:86] Added node "k8s-master-<number>-0" in group "westeurope:\x00:0" to NodeTree
I0624 08:38:28.459744       1 node_tree.go:86] Added node "k8s-workerdev37-<number>-1" in group "westeurope:\x00:1" to NodeTree
I0624 08:38:28.459851       1 node_tree.go:86] Added node "k8s-workerdev37-<number>-0" in group "westeurope:\x00:0" to NodeTree
I0624 08:38:32.373272       1 leaderelection.go:252] successfully acquired lease kube-system/kube-scheduler

maybe the order is important.

Note that in the upper logs workerdev36-...-1 is in group \x00:0 while it is in \x00:1 in the lower and worker0 is added "much" later in the faulty cluster. However pods are not scheduled to workerdev36-...-1.

@chewong
Copy link

chewong commented Jun 30, 2020

@Jabb0 I don't think the order matters. The westeurope-0 and westeurope-1 are referring to the fault domain of the VM availability set, which shouldn't affect pod scheduling.

Based on the kube-scheduler log, the scheduling error is either didn't match node selector or didn't have free ports for the requested pod ports. For the first error, I am not sure why it's happening since node labels are applied correctly to both nodes and the node selector of the unschedulable pods seems to be correct. For the second error, you might have to SSH into the VM to see what's occupying the port. Did you experience a similar error when you try to schedule other pods?

@Jabb0
Copy link

Jabb0 commented Jul 1, 2020

@chewong
The following log of kube-scheduler is full of "Unable to schedule" errors of system pods such as:

I0624 06:59:14.422558       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-provider-azure-zdkwx: no fit: 0/4 nodes are available: 3 node(s) didn't match node selector.; waiting

Here the ports are not an issue. However it only explains 3 out of 4 nodes. Is that intentional?
Once worker0 is swamped with pods and we try to schedule the next pod the error is:

I0624 07:46:58.469185       1 factory.go:462] Unable to schedule kube-system/csi-secrets-store-4tmv2: no fit: 0/4 nodes are available: 1 Insufficient cpu, 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.; waiting
I0624 07:46:58.483161       1 factory.go:462] Unable to schedule dev/<ourpod>-6bc978c767-hbgqc: no fit: 0/4 nodes are available: 1 Insufficient cpu, 2 node(s) didn't match node selector.; waiting

Is it necessary for the any pods (e.g. the CSI ones) to run on the node before any pod can be scheduled?

@chewong
Copy link

chewong commented Jul 1, 2020

@Jabb0 I think it's intentional, the 3 nodes are your helper & worker nodes, the remaining node should be master, which some pods can't be scheduled to.

Those pods are just part of an add-on that aks-engine deploys after the cluster is provisioned, which should not affect pod scheduling at all. I suspect that the kubelet or even the underlying VMs are experiencing some problems. If you encounter this problem again, could you share the kubelet log of the problematic node? See https://github.com/Azure/aks-engine/blob/master/docs/howto/troubleshooting.md for how to collect kubelog log.

@jackfrancis
Copy link
Member

jackfrancis commented Aug 6, 2020

Should we leave this issue open? Are we still repro'ing?

@stale
Copy link

stale bot commented Oct 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 11, 2020
@stale stale bot closed this as completed Dec 19, 2020
@dborman-hpe
Copy link

We are seeing this issue with Kubernetes 1.18.6 on an intermittent basis during an install (k8s plus services running on the cluster). Our workaround has been to reboot the master node on which the lead kube-scheduler is running, and had one instance where we had to do that twice to get through our initial install. We're running with three physical nodes, and each physical node has 1 master VM and 1 worker VM. We are using a stacked master/etcd. We bring up the first physical node and get k8s running, along with OpenEBS to provide persistent storage. While the second physical node is coming up, and the master/worker VMs are joining the cluster, the original kube-scheduler container restarts. The new leader for the kube-scheduler is the one that stops scheduling on one of the worker nodes. I've added "-v=4" to the kube-scheduler to get more logs. For the node that isn't getting scheduled, I see only these two references to it in the logs:

1.log.20210727-225429:2021-07-27T18:43:45.594060431-07:00 stderr F I0728 01:43:45.593943       1 node_tree.go:86] Added node "k8s-worker1-vinnygambini-mercury01-x5ql" in group ":\x00:vinnygambini-mercury01.vinnygambini.local" to NodeTree
1.log.20210727-225429:2021-07-27T18:43:45.594156487-07:00 stderr F I0728 01:43:45.594035       1 eventhandlers.go:104] add event for node "k8s-worker1-vinnygambini-mercury01-x5ql"

So, the kube-scheduler was aware that this node exists. Another interesting set of log messages is:

2021-07-27T22:54:20.710506939-07:00 stderr F I0728 05:54:20.710405       1 scheduler.go:776] Updating pod condition for services/externaldns-etcd-vmrbkwj6ts to (PodScheduled==False, Reason=Unschedulable)
2021-07-27T22:54:20.710680409-07:00 stderr F I0728 05:54:20.710557       1 scheduling_queue.go:810] About to try and schedule pod sysmgmt-health/sysmgmt-health-prometheus-node-exporter-6h2tv
2021-07-27T22:54:20.710702009-07:00 stderr F I0728 05:54:20.710597       1 scheduler.go:566] Attempting to schedule pod: sysmgmt-health/sysmgmt-health-prometheus-node-exporter-6h2tv
2021-07-27T22:54:20.716626669-07:00 stderr F I0728 05:54:20.716399       1 generic_scheduler.go:1042] Node k8s-worker1-vinnygambini-mercury02-ztsv is a potential node for preemption.
2021-07-27T22:54:20.716653201-07:00 stderr F I0728 05:54:20.716435       1 generic_scheduler.go:1042] Node k8s-master-vinnygambini-mercury03-kbcy is a potential node for preemption.
2021-07-27T22:54:20.716663288-07:00 stderr F I0728 05:54:20.716445       1 generic_scheduler.go:1042] Node k8s-worker1-vinnygambini-mercury03-rsmo is a potential node for preemption.
2021-07-27T22:54:20.716672621-07:00 stderr F I0728 05:54:20.716454       1 generic_scheduler.go:1042] Node k8s-master-vinnygambini-mercury01-x20i is a potential node for preemption.
2021-07-27T22:54:20.716680776-07:00 stderr F I0728 05:54:20.716463       1 generic_scheduler.go:1042] Node k8s-master-vinnygambini-mercury02-vzsx is a potential node for preemption.
2021-07-27T22:54:20.716688963-07:00 stderr F I0728 05:54:20.716472       1 generic_scheduler.go:1042] Node k8s-master-vinnygambini-mercury03-kbcy is a potential node for preemption.
2021-07-27T22:54:20.716845087-07:00 stderr F I0728 05:54:20.716689       1 factory.go:462] Unable to schedule sysmgmt-health/sysmgmt-health-prometheus-node-exporter-6h2tv: no fit: 0/6 nodes are available: 5 node(s) didn't have free ports for the requested pod ports.; waiting

Note that k8s-master-vinnygambini-mercury03-kbcy is twice in the list, and k8s-worker1-vinnygambini-mercury01-x5ql is not in the list.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

6 participants