-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GKE Autopilot: Separate game server workloads #2840
Conversation
Build Failed 😱 Build Id: de97fdd2-2c72-402b-86b3-0d0751d53360 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
d64ca6e
to
875cf47
Compare
Build Failed 😱 Build Id: d5490158-9d0d-4d1b-a29e-f38cb03a847c To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 318e39f9-9b01-4ade-8d31-01af27c9dd89 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: f534c551-68f6-4949-9f5b-47ccda636e2b The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that in pkg/gameservers/controller.go we create the pod and mutate it in very different places in the code. If we (for some reason) decide to set tolerations or a node selector on all game server pods in the future (e.g. in the call to gs.Pod(sidecar)
) that will silently stop the gke autopilot provider from adding the extra fields.
I think this could be caught by adding a unit test in pkg/gameservers/controller_test.go calling createGameServerPod
without the gke autopilot provider and verifying that we don't set any tolerations or node selectors. That would still make the code do the right thing if a user set one in the pod spec, but would cause a test to fail if we added code that did it for all pod specs.
875cf47
to
088f4b9
Compare
Good call, done! |
Build Succeeded 👏 Build Id: 765ee11d-baab-4733-a74a-f36408d09f1d The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
082e12d
to
11df4d5
Compare
Build Succeeded 👏 Build Id: f2714835-0f09-4a61-98bc-42c480cfd05c The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: 3cdcb57f-653d-4584-97ea-2ff4b67bb0dd The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
This PR does a couple of things towards googleforgames#2777: * Enforce that on Autopilot the scheduling strategy is `Packed` * If the user does not provide a nodeSelector or tolerations in the PodSpec, we add: ``` nodeSelector: agones.dev/role: gameserver tolerations: - effect: NoSchedule key: agones.dev/role operator: Equal value: gameserver ``` This has the effect of splitting the game server pods off to their own nodes ala https://wdenniss.com/autopilot-workload-separation. If the user has either defined, we assume they know what they're doing and skip this. Note: In `Packed` we already set an affinity: ``` affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: agones.dev/role: gameserver topologyKey: kubernetes.io/hostname weight: 100 ``` We keep this affinity, but it becomes mostly a no-op after using a `nodeSelector`.
11df4d5
to
987737b
Compare
Build Succeeded 👏 Build Id: 25ab5cc3-ee26-4829-b154-039c739e87d5 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: roberthbailey, zmerlynn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR does a couple of things towards googleforgames#2777: * Enforce that on Autopilot the scheduling strategy is `Packed` * If the user does not provide a nodeSelector or tolerations in the PodSpec, we add: ``` nodeSelector: agones.dev/role: gameserver tolerations: - effect: NoSchedule key: agones.dev/role operator: Equal value: gameserver ``` This has the effect of splitting the game server pods off to their own nodes ala https://wdenniss.com/autopilot-workload-separation. If the user has either defined, we assume they know what they're doing and skip this. Note: In `Packed` we already set an affinity: ``` affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: agones.dev/role: gameserver topologyKey: kubernetes.io/hostname weight: 100 ``` We keep this affinity, but it becomes mostly a no-op after using a `nodeSelector`.
…ok (#2857) * SafeToEvict feature: Implement the `eviction.safe` API logic Towards #2794: This commit implements the defaulting/validation for the eviction.safe field, under the SafeToEvict feature gate. In particular, it enshrines the defaulting logic for the somewhat complicated backwards compatibility case (safe-to-evict=true is set but SafeToEvict is not). Along the way, refactor a very repetitive unit test table. * Ran gen-api-docs * Implement SafeToEvict enforcement of safe-to-evict annotations/label This commit adds the interpretation of GameServer.Status.Eviction: * Following the pattern in #2840, moves the enforcement of cloudproduct specific things up to the controller, adding a new cloudproduct SetEviction hook to handle it. * As a result, when the SafeToEvict feature flag is on, we disable annotation in gameserver.go and test that we don't set anything (as a change detection test). * Adds a generic and GKE Autopilot version of how to set policy from the eviction.safe status. In Autopilot, we skip using safe-to-evict=false.
This PR does a couple of things towards #2777:
Enforce that on Autopilot the scheduling strategy is
Packed
If the user does not provide a nodeSelector or tolerations in the PodSpec, we add:
This has the effect of splitting the game server pods off to their own nodes ala https://wdenniss.com/autopilot-workload-separation.
If the user has either defined, we assume they know what they're doing and skip this.
Note: In
Packed
we already set an affinity:We keep this affinity, but it becomes mostly a no-op after using a
nodeSelector
./kind feature