Multi-tenant interceptor and scaler #206

arschles · 2021-07-01T17:45:53Z

This is a large PR that makes the interceptor and external scaler multi-tenant.

See below for testing instructions.

Instead of starting up a single interceptor/scaler automatically per application, a fleet of interceptors/scalers run and can operate on any application in the same namespace. Interceptors can dynamically proxy requests based on the incoming request, scalers can dynamically report metrics on all applications, and the operator can provide routing information to the interceptor fleet. See https://hackmd.io/@arschles/mutitenant-keda-http-addon for more detailed design information.

In this pull request:

The operator will not install any Pods into the cluster when any HTTPScaledObject is created
Any given interceptor pod can route a request to any installed application (in the same namespace)
Interceptors are (still) horizontally scalable
The operator maintains a "routing table" - a lookup table from hostname to the Service, Port, and Deployment` of a backing application
Interceptor pods periodically request the updated routing table from the operator, and update their internal copy
When the routing table is updated -- for example, when a new HTTPScaledObject is created -- the operator pings all interceptor pods to refresh their copy

There are several follow-ups to this PR, including but not limited to:

Checklist

Follow-Ups

Verify that the routing table was updated in interceptors (design)
Factor watch/list functionality out into a common abstraction (similar to, but simpler than informers): factoring watch/list functionality out of config map updater and deployment cache arschles/http-add-on#13

Fixes #183
Fixes #214
Fixes #101

NOTE: since this is a large pull request, we've split up the work. To do so, we've created branches off of this branch and submitted PRs to this branch. Following are PRs that need to be merged into into this branch before this should be reviewed and merged:

Add logic to update configMaps on operator arschles/http-add-on#10
- Add logic in interceptor to update routing table from ConfigMap arschles/http-add-on#11 should be merged into Add logic to update configMaps on operator arschles/http-add-on#10 before Add logic to update configMaps on operator arschles/http-add-on#10 is reviewed and merged

cc/ @yaron2 @tomkerkhove

yaron2 · 2021-07-08T20:08:29Z

@arschles when do you think this will be ready for review?

arschles · 2021-07-12T18:15:02Z

@yaron2 this week sometime. I have to finish the routing table storage in the operator, then I can test it out, and from there I'll hit the button to make it not a draft PR. I'll @ you at that point as well.

for context, I just moved to a new house 600 miles away over the weekend and am getting back on my feet 😆

arschles · 2021-07-14T00:47:56Z

@yaron2 an update here - everything is built at this point. I have an M1 Mac and having problems (possibly qemu related) building images for amd64 architectures, so I am figuring that out and then I'll be ready to test in a cluster. Feel free to start reviewing this in the meantime if you like. The code is still rough, but it would be great to have more eyes on it sooner rather than later.

yaron2 · 2021-07-14T01:04:18Z

@yaron2 an update here - everything is built at this point. I have an M1 Mac and having problems (possibly qemu related) building images for amd64 architectures, so I am figuring that out and then I'll be ready to test in a cluster. Feel free to start reviewing this in the meantime if you like. The code is still rough, but it would be great to have more eyes on it sooner rather than later.

Roger Roger.

khaosdoctor · 2021-08-02T13:38:36Z

@arschles can you please add another to-do list item:

Update charts on feat: multi-tenant scaler and interceptor in the HTTP add-on charts#169

khaosdoctor · 2021-08-02T13:39:35Z

I Will be responsible for the following:

Change routing table communication strategy:
- Operator records routing table to ConfigMap
- Each interceptor fetches it on startup, records it to memory
- Interceptors have an open watch stream on the map
- Interceptors periodically fetch the ConfigMap (to ensure they converge to the correct table, even if they miss events)

If all goes right, should start working on it next Wed

arschles · 2021-08-02T21:07:16Z

@arschles can you please add another to-do list item:

@khaosdoctor that TODO list item is in there but already checked off. would you like me to uncheck it?

arschles · 2021-08-02T21:13:29Z

I Will be responsible for the following:

@khaosdoctor FYI I'm going to be working on adding e2e tests in this branch as well

khaosdoctor · 2021-08-03T16:26:55Z

@arschles can you please add another to-do list item:

@khaosdoctor that TODO list item is in there but already checked off. would you like me to uncheck it?

Oh haven't seen it! No worries, it's fine then :D

khaosdoctor · 2021-08-03T16:27:23Z

I Will be responsible for the following:

@khaosdoctor FYI I'm going to be working on adding e2e tests in this branch as well

Yep! I booked some time weekly starting next week to finish it ASAP

khaosdoctor · 2021-08-11T17:46:49Z

@arschles I will start the new routing table strategy and merge it into your global-components branch so we only create a single PR here with all the changes (and also so I have all the changes you've made)

arschles · 2021-08-17T15:57:29Z

How to test this

Ensure that KEDA is already installed
Build the images in this PR using the following command: mage dockerbuild dockerpush (or if you want to use ACR Tasks: mage dockerbuildacr)
Check out the branch in this PR: feat: multi-tenant scaler and interceptor in the HTTP add-on charts#169
Install the chart from inside the kedacore/charts repo: helm install http-add-on ./http-add-on -n $NAMESPACE --set images.tag=${TAG} --set images.operator=${OPERATOR_IMG} --set images.scaler=${SCALER_IMG} --set images.interceptor=${INTERCEPTOR_IMG}
From inside this repository in this branch: helm install xkcd ./examples/xkcd -n $NAMESPACE
Now that the app is installed, you can issue requests to it. Use the keda-add-ons-http-interceptor-proxy Service on port 8080 for that. From inside the cluster in the same $NAMESPACE, do this: curl -H "Host: myhost.com" keda-add-ons-http-interceptor-proxy:8080
- Note that you need the Host header so that the interceptor routes to the right place. To change that host, add this flag to the end of the helm install xkcd command: --set host=<your host>

Signed-off-by: Aaron Schlesinger <aaron@ecomaz.net>

…mponents Signed-off-by: Aaron Schlesinger <70865+arschles@users.noreply.github.com>

yaron2

lgtm

yaron2 · 2021-09-01T21:26:04Z

Reviewed offline with @arschles, looks good.

tomkerkhove · 2021-09-03T08:19:34Z

Adding @zroubalik in case he wants to review as well

tomkerkhove · 2021-09-03T08:19:45Z

Oh OK, nvm - Auto-merge was on.

benjaminhuo · 2021-09-03T08:25:58Z

Huge improvement！

zroubalik · 2021-09-03T12:05:28Z

Oh OK, nvm - Auto-merge was on.

🤣

zroubalik

Looking good, though the PR is so huge, that I might have missed anything.
But I trust @arschles :) Great work!

arschles · 2021-09-03T17:28:07Z

thank you @tomkerkhove and @benjaminhuo

@zroubalik sorry this got merged before you got a chance to look at it. if you'd like to review, please feel free to! I can make changes/fixes in follow-up PRs

zroubalik · 2021-09-06T08:17:37Z

@arschles no worries, I went through the code after the merge and it was looking good :)

pkit · 2021-09-16T10:53:22Z

Eh, unfortunately 2 pods per namespace still totally defeats the purpose of "multi-tenant scale to zero" as tenants usually live in namespaces and usually that's what is needed: scale each namespace to zero pods.

arschles · 2021-09-17T18:08:47Z

@pkit there needs to be at least one pod running, though, to handle incoming requests to applications that are scaled to zero. or am I missing something?

benjaminhuo · 2021-09-18T01:59:48Z

@pkit there needs to be at least one pod running, though, to handle incoming requests to applications that are scaled to zero. or am I missing something?

In knative, the activator pod is just for this purpose which is always on

pkit · 2021-09-18T07:10:44Z

@arschles yup, 1 global deployment of request handlers is ok. But 1 per namespace is not that useful.
@benjaminhuo Knative solution looks good.

arschles · 2021-09-20T17:29:17Z

@benjaminhuo @pkit we've scoped the interceptor to an individual namespace on purpose, because KEDA is also scoped to a single namespace. We could expand the interceptor to be cluster-global, but doing so to gain better economies of scale would only make sense if the external scaler were made cluster-global as well.

@yaron2 - you raised the issue that prompted this PR in the first place. WDYT?

Also, @tomkerkhove and @zroubalik WDYT as well?

pkit · 2021-09-20T17:36:08Z

@arschles my idea was to use something other than full fledged knative installation to just have that "scale-to-zero" feature.
But it seems like there is literally no other solution.
Thanks!

arschles · 2021-09-20T18:56:35Z

@pkit not sure what you mean?

pkit · 2021-09-20T19:06:57Z

@arschles there is no solution (other than knative) that provides "namespace scale to zero" functionality.

tomkerkhove · 2021-09-21T06:28:15Z

@benjaminhuo @pkit we've scoped the interceptor to an individual namespace on purpose, because KEDA is also scoped to a single namespace. We could expand the interceptor to be cluster-global, but doing so to gain better economies of scale would only make sense if the external scaler were made cluster-global as well.

@yaron2 - you raised the issue that prompted this PR in the first place. WDYT?

Also, @tomkerkhove and @zroubalik WDYT as well?

Weren't we going to support both of the scenarios? I thought that was the case where you could have it cluster-wide if you want to centralize or have it namespaced if you want to isolate.

We do the same with KEDA where you can deploy it cluster-wide or scoped if you want to.

I think we should align, what are your thoughts @zroubalik @yaron2 ?

arschles · 2021-09-21T16:41:53Z

@arschles there is no solution (other than knative) that provides "namespace scale to zero" functionality.

got it, thanks for clarifying. stay tuned, we might be adding that functionality soon (see #206 (comment))

Weren't we going to support both of the scenarios? I thought that was the case where you could have it cluster-wide if you want to centralize or have it namespaced if you want to isolate.

@tomkerkhove somebody asked about that, but we didn't go all the way to making it cluster-global. we did, however, decide to allow interceptors/scalers/operators to run in any arbitrary namespace. making them cluster-global would be different work.

I wasn't aware that you could install KEDA at the cluster-wide level. can you point me to any docs on how to do that? I think if KEDA can be global, that makes it easier for the addon to do so as well.

tomkerkhove · 2021-09-21T17:34:38Z

It's part of the helm chart/configuration - https://github.com/kedacore/charts/blob/master/keda/README.md#configuration

It's called watchnamespace and is cluster-wide by default, but scopable if you need to

arschles · 2021-09-21T22:41:02Z

👍 . Not sure how I didn't know this. I don't have the need to use KEDA across multiple namespaces much, I guess 😆

I think, then, that #240 can go on as planned (because it doesn't make much sense for KEDA to be cluster-global, but this project to not). @pkit your wish from #206 (comment) is going to be granted 😄

arschles mentioned this pull request Jul 1, 2021

feat: multi-tenant scaler and interceptor in the HTTP add-on kedacore/charts#169

Merged

6 tasks

arschles marked this pull request as ready for review July 14, 2021 00:48

arschles requested review from ahmelsayed, tomkerkhove and zroubalik as code owners July 14, 2021 00:48

arschles mentioned this pull request Jul 23, 2021

Using structured logging in the interceptor #182

Closed

2 tasks

khaosdoctor mentioned this pull request Jul 28, 2021

Expose service configuration #112

Closed

2 tasks

arschles mentioned this pull request Jul 29, 2021

Not scaling to 0 #214

Closed

This was referenced Aug 11, 2021

New routing table interceptor #222

Closed

Send metrics to scaler to avoid latency in cold start #219

Closed

arschles force-pushed the global-components branch from 11899a1 to cb9e61c Compare August 16, 2021 21:08

arschles mentioned this pull request Aug 16, 2021

Have interceptors record what routing table version(s) they are aware of #225

Closed

This was referenced Aug 18, 2021

Check for config maps to be changed in the operator routing table tests #228

Closed

Use the external push gRPC protocol for external scaler => interceptor communication #97

Open

arschles force-pushed the global-components branch from 15b47b4 to b82dd44 Compare August 18, 2021 22:04

multi-tenant interceptor and scaler

43e8eff

Signed-off-by: Aaron Schlesinger <aaron@ecomaz.net>

arschles force-pushed the global-components branch from b82dd44 to 43e8eff Compare August 18, 2021 22:06

Merge branch 'main' of github.com:kedacore/http-add-on into global-co…

cbe11ec

…mponents Signed-off-by: Aaron Schlesinger <70865+arschles@users.noreply.github.com>

arschles mentioned this pull request Aug 30, 2021

Adding a fake controller-runtime Client implementation and improving operator tests #247

Merged

2 tasks

yaron2 approved these changes Sep 1, 2021

View reviewed changes

arschles enabled auto-merge (squash) September 1, 2021 21:26

arschles mentioned this pull request Sep 1, 2021

Adding configurability to the interceptor's HTTP round tripper #250

Merged

3 tasks

tomkerkhove approved these changes Sep 3, 2021

View reviewed changes

arschles merged commit c211da9 into kedacore:main Sep 3, 2021

tomkerkhove assigned zroubalik Sep 3, 2021

zroubalik reviewed Sep 3, 2021

View reviewed changes

arschles mentioned this pull request Sep 29, 2021

Deployment cache does not fire the right event on polling intervals #277

Closed

This was referenced Oct 18, 2021

In-place edits for HTTPScaledObject #7

Closed

Autoscaling reaction time is slow #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-tenant interceptor and scaler #206

Multi-tenant interceptor and scaler #206

arschles commented Jul 1, 2021 •

edited

Loading

yaron2 commented Jul 8, 2021

arschles commented Jul 12, 2021

arschles commented Jul 14, 2021 •

edited

Loading

yaron2 commented Jul 14, 2021

khaosdoctor commented Aug 2, 2021 •

edited by arschles

Loading

khaosdoctor commented Aug 2, 2021 •

edited

Loading

arschles commented Aug 2, 2021

arschles commented Aug 2, 2021

khaosdoctor commented Aug 3, 2021

khaosdoctor commented Aug 3, 2021

khaosdoctor commented Aug 11, 2021 •

edited

Loading

arschles commented Aug 17, 2021

yaron2 left a comment

yaron2 commented Sep 1, 2021

tomkerkhove commented Sep 3, 2021

tomkerkhove commented Sep 3, 2021

benjaminhuo commented Sep 3, 2021

zroubalik commented Sep 3, 2021

zroubalik left a comment

arschles commented Sep 3, 2021

zroubalik commented Sep 6, 2021

pkit commented Sep 16, 2021

arschles commented Sep 17, 2021

benjaminhuo commented Sep 18, 2021

pkit commented Sep 18, 2021

arschles commented Sep 20, 2021

pkit commented Sep 20, 2021

arschles commented Sep 20, 2021

pkit commented Sep 20, 2021

tomkerkhove commented Sep 21, 2021

arschles commented Sep 21, 2021

tomkerkhove commented Sep 21, 2021

arschles commented Sep 21, 2021 •

edited

Loading

Multi-tenant interceptor and scaler #206

Multi-tenant interceptor and scaler #206

Conversation

arschles commented Jul 1, 2021 • edited Loading

Checklist

Follow-Ups

yaron2 commented Jul 8, 2021

arschles commented Jul 12, 2021

arschles commented Jul 14, 2021 • edited Loading

yaron2 commented Jul 14, 2021

khaosdoctor commented Aug 2, 2021 • edited by arschles Loading

khaosdoctor commented Aug 2, 2021 • edited Loading

arschles commented Aug 2, 2021

arschles commented Aug 2, 2021

khaosdoctor commented Aug 3, 2021

khaosdoctor commented Aug 3, 2021

khaosdoctor commented Aug 11, 2021 • edited Loading

arschles commented Aug 17, 2021

How to test this

yaron2 left a comment

Choose a reason for hiding this comment

yaron2 commented Sep 1, 2021

tomkerkhove commented Sep 3, 2021

tomkerkhove commented Sep 3, 2021

benjaminhuo commented Sep 3, 2021

zroubalik commented Sep 3, 2021

zroubalik left a comment

Choose a reason for hiding this comment

arschles commented Sep 3, 2021

zroubalik commented Sep 6, 2021

pkit commented Sep 16, 2021

arschles commented Sep 17, 2021

benjaminhuo commented Sep 18, 2021

pkit commented Sep 18, 2021

arschles commented Sep 20, 2021

pkit commented Sep 20, 2021

arschles commented Sep 20, 2021

pkit commented Sep 20, 2021

tomkerkhove commented Sep 21, 2021

arschles commented Sep 21, 2021

tomkerkhove commented Sep 21, 2021

arschles commented Sep 21, 2021 • edited Loading

arschles commented Jul 1, 2021 •

edited

Loading

arschles commented Jul 14, 2021 •

edited

Loading

khaosdoctor commented Aug 2, 2021 •

edited by arschles

Loading

khaosdoctor commented Aug 2, 2021 •

edited

Loading

khaosdoctor commented Aug 11, 2021 •

edited

Loading

arschles commented Sep 21, 2021 •

edited

Loading