Adding an allocator service that acts as a reverse proxy. #768

pooneh-m · 2019-05-15T00:17:01Z

Allocator service will solve the problem of sending allocation request from outside the kubernetes cluester as discussed in this issue #597 (comment). The service has an external IP and authenticate using client cert.

agones-bot · 2019-05-15T00:33:58Z

Build Succeeded 👏

Build Id: 438877c0-5107-4ac4-b75a-2b9b6fe722c8

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:0.10.0-ea11c4a
image: gcr.io/agones-images/agones-sdk:0.10.0-ea11c4a
image: gcr.io/agones-images/agones-ping:0.10.0-ea11c4a
Linux C++ SDK (build): agonessdk-0.10.0-ea11c4a-linux-arch_64.tar.gz
SDK Server: agonessdk-server-0.10.0-ea11c4a.zip

A preview of the website (the last 30 builds are retained):

https://ea11c4a-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/GoogleCloudPlatform/agones.git pull/768/head:pr_768 && git checkout pr_768
helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-ea11c4a

cmd/allocator/main.go

jkowalski · 2019-05-15T01:37:11Z

cmd/allocator/main.go

+	w.Header().Set("Content-Type", "application/json")
+	result, _ := json.Marshal(allocatedGsa)
+	_, err = io.WriteString(w, string(result))
+	if err != nil {


at this point returning http.Error will probably have no effect, since we already sent the headers, I would just log a warning.

This also returns the http status and body with error message.

cmd/allocator/main.go

jkowalski · 2019-05-15T01:41:59Z

cmd/allocator/main.go

+
+func getCaCertPool(path string) (*x509.CertPool, error) {
+	// Add all certificates under client-certs path because there could be multiple clusters
+	// and all client certs should be added.


Normally this should be one CA certificate regardless of # of clusters, clients will use leaf certificates signed by this CA, so you technically only need one PEM file with all CA certificate bundle, but not all the client certs.

IOW - we need make sure we have the certificates example set up correctly, because people will be cargo-culting this a lot.

This acts as a revocation list for certificates as well. If a secret is compromised, its CA should be revoked without impacting other clients calling allocation service. Having this pattern is valuable for match making and not for cluster to cluster calls. If a cluster secret is compromised, then all secrets to talk to another cluster is compromised. However, matchmaker is an independent entity and this solution helps securing system from matchmakers bad actors.

jkowalski · 2019-05-15T01:45:36Z

install/helm/agones/templates/service/allocation.yaml

+
+---
+# Allocation client certs
+{{- $ca := genCA "allocation-client-ca" 3650 }}


pretty sure you want one CA and not separate CA for client and server

{{- $ca := genCA "Agones CA" 365 -}} {{- $serverCert := genSignedCert "Allocator Server" nil (list "<server-common-name-goes-here>") 3650 $ca -}} {{- $clientCert := genSignedCert "Agones Client" nil nil 365 $ca -}}

Server CA is needed if the server is self-signed. Otherwise, client can confirm the server's validity by consulting with CAs signed by valid certificate authorities e.g. DigiCert. Client CA list indicates a list of clients that a server accepts and acts as an authN.

jkowalski · 2019-05-15T01:47:21Z

install/helm/agones/templates/service/allocation.yaml

+    release: "{{ .Release.Name }}"
+    heritage: "{{ .Release.Service }}"
+data:
+{{- if .Values.agones.allocator.generateTLS }}


generateTLS?

Yes. Generate service TLS and adds its CA to the CA list.

agones-bot · 2019-05-16T17:23:09Z

Build Succeeded 👏

Build Id: c3a9588a-d603-4562-8976-f0b7f976b4b5

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:0.10.0-8208bed
image: gcr.io/agones-images/agones-sdk:0.10.0-8208bed
image: gcr.io/agones-images/agones-ping:0.10.0-8208bed
Linux C++ SDK (build): agonessdk-0.10.0-8208bed-linux-arch_64.tar.gz
SDK Server: agonessdk-server-0.10.0-8208bed.zip

A preview of the website (the last 30 builds are retained):

https://8208bed-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/GoogleCloudPlatform/agones.git pull/768/head:pr_768 && git checkout pr_768
helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-8208bed

cmd/allocator/main.go

markmandel · 2019-05-17T05:53:57Z

cmd/allocator/main.go

+
+	h := httpHandler{
+		agonesClient: agonesClient,
+		namespace:    os.Getenv("NAMESPACE"),


Any particular reason to tie this to a specific namespace? We could grab the namespace from the GameServerAllocation.ObjectMeta on deserialisation -- might be more flexible?

I am introducing gRPC interface and that will not support k8s APIs including ObjectMeta. The idea is allocator service that is deployed to a namespace is responsible for serving allocation in that namespace. Then MatchMaking does not need to be aware of internal structure of k8s and namespaces and only calls the allocator's endpoint.

But if a user wants to run GameServers in more than one namespace, then they have to run two different services - which brings us back to the same place - the matchmaker being namespace aware (and we have to run twice the infrastructure, and potentially redeploy if we want to be dynamic about the namespaces being used).

Seems simpler to me to allow the endpoint to take a namespace as an agreement? (Maybe the "default" namespace if the default?)

Wdyt?

If there are two allocator services deployed to k8s in different namespaces, matchmaker only needs to know the endpoints. However, if we want to have one allocator service deployed to one cluster and handle allocation requests for two namespaces for potentially two different purposes, then there will not be enough isolation between traffics for the two namespaces. IOW QPS for one namespace may impact allocator service performance for another namespace. So there is a trade off.

I would prefer not to expose namespace unless we for sure know that it is needed. Because adding additional fields are easy to do but removing them in future is hard as it will be a breaking change. WDYT?

IOW QPS for one namespace may impact allocator service performance for another namespace. So there is a trade off.

Everything comes back to the k8s api anyway, so we'll always have that bottleneck?

I would prefer not to expose namespace unless we for sure know that it is needed. Because adding additional fields are easy to do but removing them in future is hard as it will be a breaking change. WDYT?

I can't disagree with that statement 👍

The only other few devil's advocate statement I can make, is that I think that this makes things for the end user a tad more complicated. Up until this point, everything is installed in the agones-system namespace - now we have Agones system components bleeding into other areas of Kubernetes, whereas before they were pretty tightly contained in the agones-system namespace.

The other thing is - we're saying we're a reverse proxy for this CRD, but we are changing the expected behaviour of that CRD with the reverse proxy. So it might be a bit confusing for users.

But given your excellent point above - I think we'll be okay to have the namespace defined in the env var -- and see how users like it. Much easier to add later 👍

IOW QPS for one namespace may impact allocator service performance for another namespace. So there is a trade off.

Everything comes back to the k8s api anyway, so we'll always have that bottleneck?

Good point. I think you are right.

I would prefer not to expose namespace unless we for sure know that it is needed. Because adding additional fields are easy to do but removing them in future is hard as it will be a breaking change. WDYT?

I can't disagree with that statement

The only other few devil's advocate statement I can make, is that I think that this makes things for the end user a tad more complicated. Up until this point, everything is installed in the agones-system namespace - now we have Agones system components bleeding into other areas of Kubernetes, whereas before they were pretty tightly contained in the agones-system namespace.

The other thing is - we're saying we're a reverse proxy for this CRD, but we are changing the expected behaviour of that CRD with the reverse proxy. So it might be a bit confusing for users.

Can you please explain more? How are we changing the expected behavior?

But given your excellent point above - I think we'll be okay to have the namespace defined in the env var -- and see how users like it. Much easier to add later

Can you please explain more? How are we changing the expected behavior?

Sure - basically, through the k8s api I can pass through what namespace I'm working with as part of the URL path - so I can access whatever namespace I want. That being said, in this instance, we aren't doing that - so it's probably not that huge a deal.

The only potentially confusing thing I see is if a user sets the namespace in ObjectMeta.Namespace and it doesn't translate through to the applied namespace in the service.

But I don't see either of these things as blocking issues. As you say above, we can add this functionality later if we need it.

(Also, if someone wants to work on a new namespace, we have to provision a service account and rbac rules anyway, so it's not like you can dynamically add/remove namespace suport that quickly)

In the gRPC interface that I will introduce next, the namespace is not exposed. So ObjectMeta.Namespace will not be relevant.

cmd/allocator/main.go

agones-bot · 2019-05-20T23:08:38Z

Build Succeeded 👏

Build Id: 0de03ad2-8449-4f3a-bfef-450f3f82e423

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:0.10.0-a788e0d
image: gcr.io/agones-images/agones-sdk:0.10.0-a788e0d
image: gcr.io/agones-images/agones-ping:0.10.0-a788e0d
Linux C++ SDK (build): agonessdk-0.10.0-a788e0d-linux-arch_64.tar.gz
SDK Server: agonessdk-server-0.10.0-a788e0d.zip

A preview of the website (the last 30 builds are retained):

https://a788e0d-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/GoogleCloudPlatform/agones.git pull/768/head:pr_768 && git checkout pr_768
helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-a788e0d

agones-bot · 2019-05-21T16:43:41Z

Build Succeeded 👏

Build Id: 37ef50d2-99b3-4b72-a63f-4f6c42e3667c

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:0.10.0-02e06da
image: gcr.io/agones-images/agones-sdk:0.10.0-02e06da
image: gcr.io/agones-images/agones-ping:0.10.0-02e06da
Linux C++ SDK (build): agonessdk-0.10.0-02e06da-linux-arch_64.tar.gz
SDK Server: agonessdk-server-0.10.0-02e06da.zip

A preview of the website (the last 30 builds are retained):

https://02e06da-dot-preview-dot-agones-images.appspot.com/

To install this version:

git fetch https://github.com/GoogleCloudPlatform/agones.git pull/768/head:pr_768 && git checkout pr_768
helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-02e06da

pooneh-m changed the title ~~Adding a simple allocator service that acts as a reverse proxy.~~ Adding an allocator service that acts as a reverse proxy. May 15, 2019

markmandel added area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/feature New features for Agones feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) labels May 15, 2019