[docs-only] ADR0029 - grpc in kubernetes #9488

butonic · 2024-06-27T12:27:46Z

I investigated #8589 and tried to sum up my findings in an ADR because it may have architectural consequences.

update-docs · 2024-06-27T12:27:50Z

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

butonic · 2024-06-28T07:34:46Z

When trying to use the dns:/// resolver of grpc-go in cs3org/reva#4744 and thinking about the consequences I found a blog post that implemented a k8s-resolver to achieve Perfect Round Robin, Inclusion of Newly Created Pods and Smooth Redeployment. But I wondered if there were other resolver implementations for kubernetes, which led me to https://github.com/sercand/kuberesolver. It watches the kubernetes api and has been around since 2018 and the last release is from 2023 ... which seems solid.

I'll add it to cs3org/reva#4744 and make the service names configurable in #9490 ... then we can test the behavior under load.

butonic · 2024-07-16T16:21:09Z

After digesting this, ponder on the thought that some services expose http ports as well as grpc ... we need to clerify how http requests are retried and load balanced as well. if grpc uses headless services and dns .... that does might not mix with go micro http clients ...

kobergj

@butonic after you investigated it, which option would you personally prefer?

docs/ocis/adr/0029-grpc-in-kubernetes.md

butonic · 2024-07-17T10:03:17Z

I no longer see a strict requirement to have a service registry for the two main deployment scenarios. For a bare metal deployment I'd prefer unix sockets for grpc and for kubernetes I'd prefer DNS because the go grpc libs support balancing based on DNS. Even for docker (compose) unix sockets can be replaced with tcp connections to hostnames for setups that need to run some services in a dedicated container.

Now http requests also need to be load balanced and retried ... in kubernetes long running http connections would face the same problems as grpc: the client might try to send requests to a no longer or not yet ready / healty service. But I havent found a good resource on how to retry and load balance http connections in kubernetes based on the same dns magic that go-grpc does. Something like esiqveland/balancer, benschw/dns-clb-go, benschw/srv-lb ... but maintained? https://github.com/markdingo/cslb had a release 2023

Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>

sonarqubecloud · 2024-07-17T10:17:59Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

jvillafanez · 2024-07-17T12:53:30Z

bad, because we would lose the service registry - which migh also be good, see above

bad, because every client will hammer the dns, maybe causing a similar load problem as with the go micro kubernetes registry implementation - needs performance testing

As far as I see, the DNS would act as our service registry. We'd still have the service registry, just not maintained by us (which could be good).

While I assume this would work for kubernetes (I see some plans on how it could work), we'll also need to take into account other environments.
For docker, we might need a custom dns server added in the compose file so our services can register there and use it (using an external one might not be a good idea, specially considering entry pollution). Not sure how it could work on bare metal installations.

Moreover, this should have a fully automated setup and tear down (or provide a simple command to do it). Configuring DNS entries the way we want manually won't be for average people.

For the "client hammering the DNS", I think that would be a client behavior we could fix. I mean, once the client have resolved the DNS and we're connected to the target service, it's up to the client to decide to reuse the same connection or request a new connection to a different replica.
I guess the key point is when we want to request a new connection to the DNS server. We could hit the DNS once per request, or maybe once per minute. In any case, different services are expected to land in different replicas, so even in the worst case scenario, the workload should be shared among the replicas, although maybe not evenly.

One big problem I see with this solution is that we'll need to do a migration. This seems a big breaking change, and maybe a drawback big enough to discard the solution.

butonic · 2024-07-17T15:14:31Z

I agree that in a kubernetes environment using headless services and dns would act as the service registry for go-grpc clients. (I still need to better understand http clients.)

I see four ways tu run ocis:

local dev deployment
bare metal deployment with maybe systemd
docker (compose) deployment
kubernetes deployment

For the first three deployment types unix sockets would suffice.

In docker we can use hostnames with a tcp transport if we really need to spread the services in multiple containers. I don't see the necessity for a dedicated dns server. docker swarm also has a service concept with a virtual ip. using dns together with timeouts in the keep alive of grpc clients would work less ideal than kubernetes, but it would still work.

For kubernetes we can use dns and preconfigure all addresses using the helm charts.

IMNSHO we should aim for unix sockets and fewer processes / pods. We should move some tasks to dedicated containers for security, eg. thumbnailers and content indexing. The current helm chart deploying every service in a dedicated container is just a waste of resources - AND fragile.

For grpc the go client package has evolved to a point where it can handle everything that is necessary: https://pkg.go.dev/google.golang.org/grpc#ClientConn

A ClientConn encapsulates a range of functionality including name resolution, TCP connection establishment (with retries and backoff) and TLS handshakes. It also handles errors on established connections by re-resolving the name and reconnecting.

Retries are a matter of configuration.

But picking up new dns entries is ... a long standing issue grpc/grpc#12295 with the two scenarios (existing pod goes down, new pod comes up) starting to be discussed in grpc/grpc#12295 (comment). Reading the thread it seems the default dns:// resolver will, by design, not pick up new pods unless we configure a MaxConnectionAge on the server-side. The 'optimal' solution is to use a name resolution system that has notifications - aka the kubernetes API.

cs3org/reva#4744 allows us to test and benchmark both: grpc go with dns:// (using headless services) and kubernetes:// (using the kubernetes api) addresses without breaking backwards compatibility (using the go micro service registry).

jvillafanez · 2024-07-17T16:37:44Z

As far as I understand, the DNS would also act as a load balancer by choosing a random (or not so random) replica. I mean, "serviceA" might get the ip "10.10.10.1" for "serviceZ", but "serviceB" might get ip "10.10.10.2" also for "serviceZ". This will be done client-side: the DNS will return one or more ips so the client will need to choose which one it wants to use.

It seems docker has an internal DNS server we could use. Assuming it has all the capabilities we need, we wouldn't need a custom DNS server. (I don't know how we can configure the DNS to provide the SRV records we need - or how are we going to register our services in the DNS otherwise; so we might still need a custom DNS we can configure at will)

IMNSHO we should aim for unix sockets and fewer processes / pods. We should move some tasks to dedicated containers for security, eg. thumbnailers and content indexing. The current helm chart deploying every service in a dedicated container is just a waste of resources - AND fragile.

If we're going through the dns route, I think it should work everywhere regardless of the deployment. This includes kubernetes with every service in an independent server, despite the deployment itself could be a bad idea. Then we could have "official" deployments with different sets of services in different servers.
In this regard, using unix sockets should be an optional optimization, which could be easier to setup in one of our "official" setups, but I don't think it should be the main focus because it would limit the solution too much.

For docker, it seems that we aim for something like (only relevant content):

services:
  wopiserver_oo:
    deploy:
      mode: replicated
      replicas: 3
      endpoint_mode: dnsrr

dig response for a different container in the same docker network:

root@942926fa8300:/# dig wopiserver_oo

; <<>> DiG 9.16.48-Ubuntu <<>> wopiserver_oo
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48674
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;wopiserver_oo.			IN	A

;; ANSWER SECTION:
wopiserver_oo.		600	IN	A	172.19.0.10
wopiserver_oo.		600	IN	A	172.19.0.11
wopiserver_oo.		600	IN	A	172.19.0.9

;; Query time: 11 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Jul 18 08:49:21 CEST 2024
;; MSG SIZE  rcvd: 118

I guess that should match the kubernetes setup and whatever library we use for the connection is able to work with it.

wkloucek · 2024-09-09T05:52:47Z

docs/ocis/adr/0029-grpc-in-kubernetes.md

+To leverage the kubernetes pod state, we first used the go micro kubernetes registry implementation. When a pod fails the health or readyness probes, kubernetes will no longer
+- send traffic to the pod via the kube-proxy, which handles the ClusterIP for a service, 
+- list the pod in DNS responses when the ClusterIP is disabled by setting it to `none`
+When using the ClusterIP HTTP/1.1 requests will be routed to a working pod. 
+
+This nice setup starts to fail with long lived connections. The kube-proxy is connection based, causing requests with Keep-Alive to stick to the same pod for more than one request. Worse, HTTP/2 and in turn gRPC are multiplexing the connection. They will not pick up any changes to pods, explaining the symptomps:


are we using this at all? Isn't the Go micro registry returning pod IPs?
And from what I know, the nats-js-kv service registry doesn't have any insights into healthiness or readiness of services it tries to contact.

sorry, have seen it in https://github.com/owncloud/ocis/pull/9488/files#diff-0701d2d4266b791c472766e356162c2ea1810b1037d5036c36b888ff9f947ab8R36

wkloucek · 2024-09-09T05:54:18Z

docs/ocis/adr/0029-grpc-in-kubernetes.md

+1. new pods will not be used because clients will reuse the existing gRPC connection
+2. gRPC clients will still try to send traffic to killed pods because they have not picked up that the pod was killed. Or the pod was killed a millisecond after the lookup was made.
+
+An addition to this problem are the health and readyness implementations of oCIS services not always reflecting the correct state of the service. One example is the storage-users service that returns ready `true` while runing a migration on startup.


From what I know all /healthz and /readyz endpoints are hardcoded to true. Which is funny because the debug server might be up before the actual service server.

butonic self-assigned this Jun 27, 2024

butonic changed the title ~~ADR0029 - grpc in kubernetes~~ [docs-only] ADR0029 - grpc in kubernetes Jun 27, 2024

butonic force-pushed the adr00029 branch 2 times, most recently from 0e9df5b to 4e66dec Compare June 27, 2024 13:02

This was referenced Jun 27, 2024

respect grpc service transport cs3org/reva#4744

Merged

set the configured protocol transport for service metadata #9490

Merged

butonic force-pushed the adr00029 branch from 4e66dec to 91eb88e Compare July 15, 2024 09:52

butonic added the Topic:Documentation label Jul 16, 2024

kobergj reviewed Jul 17, 2024

View reviewed changes

butonic force-pushed the adr00029 branch 3 times, most recently from b755c50 to 3667594 Compare July 17, 2024 09:56

butonic force-pushed the adr00029 branch from 3667594 to a8bd70b Compare July 17, 2024 10:10

ADR0029 - grpc in kubernetes

f8bc56a

Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>

butonic force-pushed the adr00029 branch from a8bd70b to f8bc56a Compare July 17, 2024 10:15

This was referenced Jul 19, 2024

allow configuring grpc max connection age #9657

Merged

allow configuring grpc max connection age cs3org/reva#4772

Merged

wkloucek reviewed Sep 9, 2024

View reviewed changes

butonic mentioned this pull request Oct 16, 2024

Remove Deprecations #10309

Merged

butonic removed their assignment Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs-only] ADR0029 - grpc in kubernetes #9488

[docs-only] ADR0029 - grpc in kubernetes #9488

butonic commented Jun 27, 2024

update-docs bot commented Jun 27, 2024

butonic commented Jun 28, 2024 •

edited

Loading

butonic commented Jul 16, 2024

kobergj left a comment

butonic commented Jul 17, 2024 •

edited

Loading

sonarqubecloud bot commented Jul 17, 2024

jvillafanez commented Jul 17, 2024

butonic commented Jul 17, 2024

jvillafanez commented Jul 17, 2024 •

edited

Loading

wkloucek Sep 9, 2024

wkloucek Sep 9, 2024

wkloucek Sep 9, 2024

[docs-only] ADR0029 - grpc in kubernetes #9488

Are you sure you want to change the base?

[docs-only] ADR0029 - grpc in kubernetes #9488

Conversation

butonic commented Jun 27, 2024

update-docs bot commented Jun 27, 2024

butonic commented Jun 28, 2024 • edited Loading

butonic commented Jul 16, 2024

kobergj left a comment

Choose a reason for hiding this comment

butonic commented Jul 17, 2024 • edited Loading

sonarqubecloud bot commented Jul 17, 2024

Quality Gate passed

jvillafanez commented Jul 17, 2024

butonic commented Jul 17, 2024

jvillafanez commented Jul 17, 2024 • edited Loading

wkloucek Sep 9, 2024

Choose a reason for hiding this comment

wkloucek Sep 9, 2024

Choose a reason for hiding this comment

wkloucek Sep 9, 2024

Choose a reason for hiding this comment

butonic commented Jun 28, 2024 •

edited

Loading

butonic commented Jul 17, 2024 •

edited

Loading

jvillafanez commented Jul 17, 2024 •

edited

Loading