Controller is not aware of the number of available IP addresses #18

arun-gupta · 2017-12-20T23:31:13Z

Controller is not aware of how many IP addresses are available to be assigned to pods. It tries to assign an IP address for the new pod and then fails reactively. It should have this information proactively.

Created a 2 node t2.medium cluster as:

kops create cluster \
--name example.cluster.k8s.local \
--zones us-east-1a,us-east-1b,us-east-1c \
--networking amazon-vpc-routed-eni \
--node-size t2.medium
--kubernetes-version 1.8.4 \
--yes

Created a Deployment using the configuration file:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3 
  template:
    metadata:
      labels:
        app: nginx 
    spec:
      containers:
      - name: nginx 
        image: nginx:1.12.1
        ports: 
        - containerPort: 80
        - containerPort: 443

Scaled the replicas:

kubectl scale --replicas=30 deployment/nginx-deployment

30 pods are expected in the cluster (2 * 3 * 5) but only 27 pods are available. Three pods are always in ContainerCreating state.

A similar cluster was created with m4.2xlarge. 120 pods (2 * 4 * 15) are expected in the cluster, but only 109 pods are available.

More details about one of the pods that is not getting scheduled:

$ kubectl describe pod/nginx-deployment-745df977f7-tndc7
Name:           nginx-deployment-745df977f7-tndc7
Namespace:      default
Node:           ip-172-20-68-116.ec2.internal/172.20.95.9
Start Time:     Wed, 20 Dec 2017 14:32:53 -0800
Labels:         app=nginx
                pod-template-hash=3018953393
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-745df977f7","uid":"96335f7f-e5d5-11e7-bc3a-0a41...
                kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container nginx
Status:         Pending
IP:             
Created By:     ReplicaSet/nginx-deployment-745df977f7
Controlled By:  ReplicaSet/nginx-deployment-745df977f7
Containers:
  nginx:
    Container ID:   
    Image:          nginx:1.12.1
    Image ID:       
    Ports:          80/TCP, 443/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7wp6n (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-7wp6n:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-7wp6n
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                   From                                    Message
  ----     ------                  ----                  ----                                    -------
  Normal   Scheduled               55m                   default-scheduler                       Successfully assigned nginx-deployment-745df977f7-tndc7 to ip-172-20-68-116.ec2.internal
  Normal   SuccessfulMountVolume   55m                   kubelet, ip-172-20-68-116.ec2.internal  MountVolume.SetUp succeeded for volume "default-token-7wp6n"
  Warning  FailedCreatePodSandBox  55m (x8 over 55m)     kubelet, ip-172-20-68-116.ec2.internal  Failed create pod sandbox.
  Normal   SandboxChanged          5m (x919 over 55m)    kubelet, ip-172-20-68-116.ec2.internal  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedSync              10s (x1017 over 55m)  kubelet, ip-172-20-68-116.ec2.internal  Error syncing pod

More details about the exact steps are at https://gist.github.com/arun-gupta/87f2c9ff533008f149db6b53afa73bd0

The text was updated successfully, but these errors were encountered:

ofiliz · 2017-12-21T18:09:17Z

Another option: Kubelet has an option called "--max-pods" that lets you configure how many pods can be scheduled on that node. Since the max number of IP addresses per node based on its instance type is well-known, we can start Kubelet with --maxpods=[max_IP_addresses] - [number of IP addresses used by infra]".

lstoll · 2017-12-21T18:58:19Z

Problem we had with max pods is it includes ones in the host namespace as well, which don't need an IP. So a bunch of system daemonsets were taking up already limited address space without needing an address. To work around that in our AWS CNI plugin we ended up adding a taint to the node to essentially mark it as full, and deleting pods that are failing. I think my take away has always been "there's options and none of them are great". Getting kubernetes/kubernetes#20177 upstream would probably be the best outcome.

liwenwu-amazon · 2017-12-23T19:20:20Z

Will investigate whether we can use (scheduler extension) kubernetes/kubernetes#13580 for this.

deiwin · 2018-10-19T15:38:54Z

One problem with the current --maxpods solution is that it doesn't account for IPs in "cooling mode". Even if --maxpods is set correctly, when a lot of pods are deleted and created in quick succession in a cluster with high IP address utilization, then scheduler can schedule pods to nodes that don't actually have an IP available. Not all IPs may be assigned, but all of the non-assigned IPs may be in "cooling mode" and therefore still unavailable.

liwenwu-amazon · 2018-10-21T17:30:55Z

@deiwin , right now, the "cooling period" is 30 seconds. In worst case, a Pod will get an IP from datastore in 30 seconds.

deiwin · 2018-10-22T06:29:38Z

Yes, but in our experience even 30 seconds is enough to cause many CronJob failures on a cluster with high IP address utilization. We have many CronJobs that run every minute and have a short activeDeadlineSeconds (usually either 30 or 50 seconds). These mis-schedulings cause these CronJobs to fail with DeadlineExceeded.

jaypipes · 2019-08-21T21:13:18Z

Note that unfortunately, the upstream issue (kubernetes/kubernetes#5507) that tracks this is in lifecycle/frozen and priority/backlog. I'm going to close this issue out with a plea to have folks comment on the upstream issue and see if we can find a path forward that makes IP addresses a concrete resource that is consumed by pods and tracked like other resources (CPU, memory, etc)

lstoll mentioned this issue Dec 21, 2017

Failure doesn't fail pod lstoll/k8s-vpcnet#33

Closed

liwenwu-amazon added the feature request label Dec 21, 2017

liwenwu-amazon removed the feature request label Dec 22, 2017

liwenwu-amazon self-assigned this Dec 22, 2017

liwenwu-amazon added the feature request label Dec 23, 2017

liwenwu-amazon mentioned this issue Jan 15, 2018

Enable controller to be aware of number of available IP addresses #23

Closed

This was referenced Jan 29, 2018

Kube-Scheduler Support for managing node's available IP addresses #26

Closed

Enable controller to be aware of number of available IP addresses #31

Closed

liwenwu-amazon mentioned this issue Apr 10, 2018

Pods stuck in ContainerCreating due to CNI Failing to Assing IP to Container Until aws-node is deleted #59

Closed

vpm-bradleyhession mentioned this issue Jul 3, 2018

Not releasing old ENIs #123

Closed

washingtoneg mentioned this issue Sep 13, 2018

Too many IPv4 addresses assigned to my instances #177

Closed

mogren added the needs investigation label Mar 14, 2019

jaypipes closed this as completed Aug 21, 2019

smrutiranjantripathy mentioned this issue Nov 19, 2019

EKS bootstrap script setting wrong value for Max Pods in kubelet for CNI Custom Networking awslabs/amazon-eks-ami#375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller is not aware of the number of available IP addresses #18

Controller is not aware of the number of available IP addresses #18

arun-gupta commented Dec 20, 2017

ofiliz commented Dec 21, 2017

lstoll commented Dec 21, 2017

liwenwu-amazon commented Dec 23, 2017

deiwin commented Oct 19, 2018

liwenwu-amazon commented Oct 21, 2018

deiwin commented Oct 22, 2018

jaypipes commented Aug 21, 2019

Controller is not aware of the number of available IP addresses #18

Controller is not aware of the number of available IP addresses #18

Comments

arun-gupta commented Dec 20, 2017

ofiliz commented Dec 21, 2017

lstoll commented Dec 21, 2017

liwenwu-amazon commented Dec 23, 2017

deiwin commented Oct 19, 2018

liwenwu-amazon commented Oct 21, 2018

deiwin commented Oct 22, 2018

jaypipes commented Aug 21, 2019