Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE Autopilot: Separate game server workloads #2840

Merged
merged 1 commit into from
Dec 9, 2022

Conversation

zmerlynn
Copy link
Collaborator

@zmerlynn zmerlynn commented Dec 2, 2022

This PR does a couple of things towards #2777:

  • Enforce that on Autopilot the scheduling strategy is Packed

  • If the user does not provide a nodeSelector or tolerations in the PodSpec, we add:

    nodeSelector:
      agones.dev/role: gameserver
    tolerations:
    - effect: NoSchedule
      key: agones.dev/role
      operator: Equal
      value: gameserver

This has the effect of splitting the game server pods off to their own nodes ala https://wdenniss.com/autopilot-workload-separation.

If the user has either defined, we assume they know what they're doing and skip this.

Note: In Packed we already set an affinity:

    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchLabels:
                agones.dev/role: gameserver
            topologyKey: kubernetes.io/hostname
          weight: 100

We keep this affinity, but it becomes mostly a no-op after using a nodeSelector.

/kind feature

@zmerlynn zmerlynn requested review from roberthbailey, gongmax and markmandel and removed request for cyriltovena and aLekSer December 2, 2022 00:42
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: de97fdd2-2c72-402b-86b3-0d0751d53360

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@zmerlynn zmerlynn added the feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) label Dec 2, 2022
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: d5490158-9d0d-4d1b-a29e-f38cb03a847c

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 318e39f9-9b01-4ade-8d31-01af27c9dd89

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: f534c551-68f6-4949-9f5b-47ccda636e2b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2840/head:pr_2840 && git checkout pr_2840
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.28.0-875cf47-amd64

Copy link
Member

@roberthbailey roberthbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that in pkg/gameservers/controller.go we create the pod and mutate it in very different places in the code. If we (for some reason) decide to set tolerations or a node selector on all game server pods in the future (e.g. in the call to gs.Pod(sidecar)) that will silently stop the gke autopilot provider from adding the extra fields.

I think this could be caught by adding a unit test in pkg/gameservers/controller_test.go calling createGameServerPod without the gke autopilot provider and verifying that we don't set any tolerations or node selectors. That would still make the code do the right thing if a user set one in the pod spec, but would cause a test to fail if we added code that did it for all pod specs.

@zmerlynn
Copy link
Collaborator Author

zmerlynn commented Dec 5, 2022

I think this could be caught by adding a unit test in pkg/gameservers/controller_test.go calling createGameServerPod without the gke autopilot provider and verifying that we don't set any tolerations or node selectors. That would still make the code do the right thing if a user set one in the pod spec, but would cause a test to fail if we added code that did it for all pod specs.

Good call, done!

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 765ee11d-baab-4733-a74a-f36408d09f1d

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2840/head:pr_2840 && git checkout pr_2840
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.28.0-088f4b9-amd64

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: f2714835-0f09-4a61-98bc-42c480cfd05c

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2840/head:pr_2840 && git checkout pr_2840
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.28.0-11df4d5-amd64

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 3cdcb57f-653d-4584-97ea-2ff4b67bb0dd

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2840/head:pr_2840 && git checkout pr_2840
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.28.0-082e12d-amd64

@mangalpalli mangalpalli removed the feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) label Dec 7, 2022
This PR does a couple of things towards googleforgames#2777:

* Enforce that on Autopilot the scheduling strategy is `Packed`

* If the user does not provide a nodeSelector or tolerations
in the PodSpec, we add:

```
    nodeSelector:
      agones.dev/role: gameserver
    tolerations:
    - effect: NoSchedule
      key: agones.dev/role
      operator: Equal
      value: gameserver
```

This has the effect of splitting the game server pods off to their
own nodes ala https://wdenniss.com/autopilot-workload-separation.

If the user has either defined, we assume they know what they're
doing and skip this.

Note: In `Packed` we already set an affinity:

```
    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchLabels:
                agones.dev/role: gameserver
            topologyKey: kubernetes.io/hostname
          weight: 100
```

We keep this affinity, but it becomes mostly a no-op after using
a `nodeSelector`.
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 25ab5cc3-ee26-4829-b154-039c739e87d5

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2840/head:pr_2840 && git checkout pr_2840
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.29.0-987737b-amd64

@google-oss-prow google-oss-prow bot added the lgtm label Dec 9, 2022
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: roberthbailey, zmerlynn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@roberthbailey roberthbailey merged commit fd29174 into googleforgames:main Dec 9, 2022
chiayi pushed a commit to chiayi/agones that referenced this pull request Dec 13, 2022
This PR does a couple of things towards googleforgames#2777:

* Enforce that on Autopilot the scheduling strategy is `Packed`

* If the user does not provide a nodeSelector or tolerations
in the PodSpec, we add:

```
    nodeSelector:
      agones.dev/role: gameserver
    tolerations:
    - effect: NoSchedule
      key: agones.dev/role
      operator: Equal
      value: gameserver
```

This has the effect of splitting the game server pods off to their
own nodes ala https://wdenniss.com/autopilot-workload-separation.

If the user has either defined, we assume they know what they're
doing and skip this.

Note: In `Packed` we already set an affinity:

```
    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchLabels:
                agones.dev/role: gameserver
            topologyKey: kubernetes.io/hostname
          weight: 100
```

We keep this affinity, but it becomes mostly a no-op after using
a `nodeSelector`.
roberthbailey pushed a commit that referenced this pull request Jan 5, 2023
…ok (#2857)

* SafeToEvict feature: Implement the `eviction.safe` API logic

Towards #2794: This commit implements the defaulting/validation for
the eviction.safe field, under the SafeToEvict feature gate. In
particular, it enshrines the defaulting logic for the somewhat
complicated backwards compatibility case (safe-to-evict=true is set
but SafeToEvict is not).

Along the way, refactor a very repetitive unit test table.

* Ran gen-api-docs

* Implement SafeToEvict enforcement of safe-to-evict annotations/label

This commit adds the interpretation of GameServer.Status.Eviction:

* Following the pattern in #2840, moves the enforcement of
cloudproduct specific things up to the controller, adding a new cloudproduct
SetEviction hook to handle it.
  * As a result, when the SafeToEvict feature flag is on, we disable annotation
    in gameserver.go and test that we don't set anything (as a change detection test).

* Adds a generic and GKE Autopilot version of how to set policy from the
eviction.safe status. In Autopilot, we skip using safe-to-evict=false.
@Kalaiselvi84 Kalaiselvi84 added this to the 1.29.0 milestone Jan 17, 2023
@zmerlynn zmerlynn deleted the autopilot-packed branch January 25, 2023 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants