Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatefulSet with AntiAffinity prevents cluster-autoscaler from working #33

Closed
emielvanlankveld opened this issue May 4, 2017 · 2 comments

Comments

@emielvanlankveld
Copy link

emielvanlankveld commented May 4, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug report

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: gke (machine type n1-highcpu-4)
  • OS (e.g. from /etc/os-release): Alpine Linux v3.5
  • Kernel (e.g. uname -a): Linux cc332daac761 4.9.13-moby SMP Sat Mar 25 02:48:44 UTC 2017 x86_64 Linux

What happened:
We have an issue with the cluster-autoscaler where new pods are stuck on Pending and a new node isn't being created. We see these events in the pod:

  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  3m		7s		22	cluster-autoscaler			Normal		NotTriggerScaleUp	pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  4m		0s		17	default-scheduler			Warning		FailedScheduling	No nodes are available that match all of the following predicates:: Insufficient cpu (2).

What you expected to happen:
The pods request only 1 CPU resource, so pods would definitely fit on a new node of instance type n1-highcpu-4.

How to reproduce it (as minimally and precisely as possible):
We can reproduce this by creating a new simple cluster with the following command:

gcloud container clusters create scale-test --cluster-version 1.6.2 --zone us-east1-b --additional-zones us-east1-c --machine-type n1-highcpu-4 --num-nodes 1 --preemptible --enable-autoupgrade --enable-autorepair --enable-autoscaling --min-nodes 1 --max-nodes 10

We then run kubectl apply -f "deploy.yml" with the following configuration:

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: busy-loop
spec:
  replicas: 1
  revisionHistoryLimit: 5
  template:
    metadata:
      labels:
        tier: core
        app: busy-loop
    spec:
      nodeSelector:
        cloud.google.com/gke-preemptible: "true"
      containers:
      - name: busy-loop
        image: <SIMPLE BUSY LOOP IMAGE>
        ports:
        - containerPort: 5950
          name: busy-loop
        resources:
          requests:
            cpu: "1000m"
            memory: "256Mi"
        livenessProbe:
          exec:
            command:
            - cat
            - deploy.yml
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: busy-loop
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: busy-loop
  minReplicas: 2
  maxReplicas: 100
  targetCPUUtilizationPercentage: 10
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: rethinkdb
spec:
  serviceName: rethinkdb
  replicas: 3
  template:
    metadata:
      labels:
        tier: data
        app: rethinkdb
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: rethinkdb
              topologyKey: kubernetes.io/hostname
      containers:
      - name: rethinkdb
        image: <SIMPLE RETHINKDB IMAGE>
        readinessProbe:
          tcpSocket:
            port: 28015

With this configuration new nodes are not being created at all with the NotTriggerScaleUp event message returned by the cluster-autoscaler. When we perform the exact same steps except remove the affinity setting from the configuration new nodes are created without a problem. It seems that the AntiAffinity in some way makes the cluster-autoscaler incorrectly think that there wouldn't be any room on a new node.

@mwielgus
Copy link
Contributor

mwielgus commented May 5, 2017

Thank you for the bug report. This has been fixed in CA 0.5.3 and will be included in the next 1.6 K8S release. We hope the new version of Kubernetes will be available in GKE late next week or, if unlucky, a couple days later.

@mwielgus mwielgus closed this as completed May 5, 2017
@mwielgus
Copy link
Contributor

mwielgus commented May 5, 2017

Ref: #28

frobware pushed a commit to frobware/autoscaler that referenced this issue Feb 12, 2019
UPSTREAM: <carry>: test/openshift/e2e: switch namespace to openshift-machine-api
yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants