-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved isolation of Agones controllers using taints and priority #500
Conversation
Build Failed 😱 Build Id: 02b144db-12f0-4924-95fa-c4d7a7e351dc Build Logs
|
cf7767c
to
cc3de6c
Compare
Build Failed 😱 Build Id: f8a7fbcb-8859-4ead-b1ab-2cb3637acfa4 Build Logs
|
Build Failed 😱 Build Id: f4984c6c-065e-4761-a3f1-11acbbbedcd2 Build Logs
|
Same comment as on #501 should we add the nodepool by default to https://github.com/GoogleCloudPlatform/agones/blob/master/build/gke-test-cluster/cluster.yml.jinja ? |
How about we add both nodepools and makefile targets in separate PR? This way we'll know that Agones works with and without special nodepools. |
07e04c9
to
1d69e28
Compare
Build Failed 😱 Build Id: a23d8cdd-6cdd-419c-9a51-6aa35d1a07f7 Build Logs
|
Build Failed 😱 Build Id: 62d07467-bbed-43f6-bd9f-b299d5aa6387 Build Logs
|
Build Failed 😱 Build Id: c087ff42-8ea0-4b4f-a89e-627e51f2014b Build Logs
|
1d69e28
to
240394d
Compare
PTAL, added documentation and DM template |
Build Succeeded 👏 Build Id: 630c5f5a-a263-4cb9-ab2a-ca6daa04b126 The following development artifacts have been built, and will exist for the next 30 days:
To install this version:
|
2ae404b
to
7435c23
Compare
Build Failed 😱 Build Id: 801cc8c6-5227-4298-92a4-25aec9610562 Build Logs
|
Build Succeeded 👏 Build Id: fe734624-1fbe-4021-864b-7b151950dfdc The following development artifacts have been built, and will exist for the next 30 days:
To install this version:
|
- name: "agones-system" | ||
initialNodeCount: 1 | ||
config: | ||
machineType: n1-standard-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want 4 cpu, or would 2 do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point for scaling cluster of 10Ks you want bigger machines. For this one 2 is fine.
@@ -28,6 +28,19 @@ $ helm install --name my-release --namespace agones-system agones/agones | |||
_We recommend to install Agones in its own namespaces (like `agones-system` as shown above) | |||
you can use the helm `--namespace` parameter to specify a different namespace._ | |||
|
|||
When running in production, Agones should be scheduled on a dedicated pool of nodes, distinct from where Game Servers are scheduled for better isolation and resiliency. By default Agones prefers to be scheduled on nodes labeled with `stable.agones.dev/agones-system=true` and tolerates node taint `stable.agones.dev/agones-system=true:NoExecute`. If no dedicated nodes are available, Agones will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final thing - wrap this is a feature
shortcode with a 0.8.0 publish date, so it's hidden until that date.
Details: https://agones.dev/site/docs/contribute/
Otherwise, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
7435c23
to
8e15a45
Compare
Added high priority class called "agones-system", which defines the priority at which both controller and ping service run. This causes them to be scheduled before any game server pods are scheduled. Also added default affinity and tolerations to helm values: Agones Controller and Ping will prefer (but not require) nodes labeled with "stable.agones.dev/agones-system: true". They will also tolerate taint "stable.agones.dev/agones-system=true:NoExecute". With those two mechanisms in place, isolating Agones controller should be as simple as creating dedicated node pool with appropriate annotations/labels: ``` gcloud container node-pools create agones-system ... \ --node-taints stable.agones.dev/agones-system=true:NoExecute \ --node-labels stable.agones.dev/agones-system=true ``` Observe how pods are scheduled on 'agones-system' node pool. ``` $ kubectl get pod -n agones-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE agones-controller-cd857555b-zgpqw 1/1 Running 0 34s 10.8.5.6 gke-agones-scale-agones-system-b36e72f2-7ks0 <none> agones-ping-76999c8cc9-9nghq 1/1 Running 0 42s 10.8.4.3 gke-agones-scale-agones-system-823b885e-15nz <none> agones-ping-76999c8cc9-jhjq6 1/1 Running 0 39s 10.8.5.5 gke-agones-scale-agones-system-b36e72f2-7ks0 <none> ```
8e15a45
to
1d3d995
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥🔥🔥
Build Succeeded 👏 Build Id: 28c6c934-f927-4adb-98f3-876409dfac9e The following development artifacts have been built, and will exist for the next 30 days:
To install this version:
|
Build Succeeded 👏 Build Id: 6db90892-1b8c-471f-860a-11a05276a972 The following development artifacts have been built, and will exist for the next 30 days:
To install this version:
|
Added high priority class called
agones-system
, which defines the priority at which both controller and ping service run. This causes them to be scheduled before any game server pods are scheduled.This fixes #489
Also added default affinity and tolerations to helm values:
Agones Controller and Ping will prefer (but not require) nodes labeled with
stable.agones.dev/agones-system: true
. They will also tolerate taintstable.agones.dev/agones-system=true:NoExecute
.With those two mechanisms in place, isolating Agones controller should be as simple as creating dedicated node pool with appropriate annotations/labels:
Observe how pods are scheduled on
agones-system
node pool.