Channels: Implement scale subresource #2864

zroubalik · 2020-03-30T08:48:30Z

Problem
Scale subresource provides unified kubernetes way on scaling resources.

KEDA is currently planning to support scaling of any Resource that has scale subresource. If we define scale subresource on Channel, we will able to easily plug KEDA in for autoscaling, or eventually any other tool that support scaling on scale subresource. It gives flexibility for users to choose any autoscaler they want to, as it don't neccessary bring hard dependency on KEDA.

Persona:
System Integrator

Exit Criteria
Channel and at least one specific implementation has scale subresource

Time Estimate (optional):
How many developer-days do you think this may take to resolve?
weeks

Additional context (optional)
#2154

zroubalik · 2020-03-30T08:49:14Z

@aslom @matzew @n3wscott you might be interested in this^

grantr · 2020-03-30T16:13:38Z

Let's discuss this at the next eventing WG meeting.

matzew · 2020-03-31T12:56:17Z

On the surface this sounds better than the previously proposed approach with annotations, that are less or more directly tied to an implementation.

So, yeah - I am for a more generic approach here, where KEDA is an impl. details

👍

zroubalik · 2020-03-31T14:02:15Z

With this you will have scale endpoint on the Channel, which could be targeted by KEDA or any other autoscaling tool or even with kubectl scale channel-foo.

To enable autoscaling you will still need the integration part, something that creates the resource (KEDA's ScaledObject). Because it tells KEDA what metrics it should consume, what is min/max replicas count etc.

To support another autoscaling tool, similar properties will need to be exposed (ie. create some CR with the metadata).

aslom · 2020-03-31T22:47:47Z

Annotations are used in Knative Serving so would be good to use them as well in Knative Eventing?

Last time we discussed it in WG we decided to define the goals - here is public google doc with initial notes from the meeting: https://docs.google.com/document/d/1usNmsuHBWzVaL5GGC873iGVrkKXGbc6t7bLHJL38Cyg/edit?usp=sharing (anybody should be able to edit it)

aslom · 2020-03-31T22:54:13Z

@zroubalik @grantr @matzew added to delivery WG agenda

aslom · 2020-04-02T15:14:47Z

Just as a clarification when talk about scaling channels we want to scale part it is doing delivery of events (take from form event source and dispatches over HTTP to subscribers) - that means that we may expose part of channel as Scale subresource?

markfisher · 2020-04-02T15:55:07Z

I understand the general idea that having a Scale subresource on the Channel would leave it up to each Channel implementation to determine what "internally-managed" resources (e.g. Deployments) need to scale.

However, taking the Kafka channel as an example, the underlying Deployments are multi-tenant from a consumer group perspective, while the metrics that should drive scaling must be per consumer-group (that's what lag means in Kafka: each consumer group's consumer-offset relative to the end-position on each partition).

To effectively manage scaling, perhaps the Kafka Channel should be refactored so that there is a 0-N (where N=num-partitions) pool of dispatcher deployments per consumer group. But even then it's not possible to manage the per consumer group scaling with a single Scale subresource for the Channel.

scothis · 2020-04-02T16:30:09Z

The scale subresource enables getting/setting the number of replicas for the resource. What does it mean for a channel to have replicas? How does a scale of 1 vs 2 manifest itself? For the scale subresource to work with HPA a label selector must be defined, what is it selecting?

Channels are logical constructs, not physical, and don't lend themselves to scale the same way a Deployment does. It seem that what you want to scale is the dispatcher, not the channel.

slinkydeveloper · 2020-04-03T17:05:31Z

I have two concerns about scaling the actual KafkaChannel implementation i want to share, I'm not sure if they apply to the general discussion here (for sure they apply to IMC too):

With the actual KafkaChannel architecture (more channels handled by the same dispatcher) scaling it as a black box can be more harmful than useful. If you scale it as is know and you have a channel hammered with requests, every dispatcher instance will handle every channel, just doubling the resource consumptions, so the hammered channel is just "distributed" across two instances, without potentially solving the problem. IMO we should first need to figure out how to make the scaling works distributing channels across several dispatchers: every dispatcher instance should agree with the others on what channel it wants to handle. This also implies that this mechanism should be able to split the work based on metrics of the channels, in order to fairly (based on the load) share the various channels among the instances. That looks really complicated to me. The other alternative would be to drop the "one dispatcher for all" and have a dispatcher per channel, in order to scale them granularly. Since the consumer traffic is tied to the amount of received events, it looks ok-ish to scale these dispatchers as black boxes. Even better, every dispatcher should be split in "inbound" part and "outbound", in order to scale them even more granularly (even if, at the end, the traffic is controller by the amount of received events)
Scaling for KafkaChannel has an upper limit and we need some way to represent it. The upper limit is sum of number of partitions for each handled topic. If you scale after that number, you get ghost instances that does nothing.

markfisher · 2020-04-03T17:58:54Z

@slinkydeveloper

every dispatcher instance should agree with the others on what channel it wants to handle

do you mean "what subscription it wants to handle"?

and....

dispatcher per channel

"dispatcher per subscription"?

slinkydeveloper · 2020-04-06T06:20:39Z

do you mean "what subscription it wants to handle"?

I suppose it makes sense to agree on what channel it wants to agree in our actual architecture

"dispatcher per subscription"?

Dispatcher per subscription is the ultimate goal i think

zroubalik · 2020-05-06T12:07:15Z

Thank you all for the great input, there are difinitely things I was missing before :)

github-actions · 2020-11-25T01:17:31Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

zroubalik added the kind/feature-request label Mar 30, 2020

grantr added area/channels priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 30, 2020

grantr added this to the Backlog milestone Aug 24, 2020

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2020

github-actions bot closed this as completed Dec 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channels: Implement scale subresource #2864

Channels: Implement scale subresource #2864

zroubalik commented Mar 30, 2020 •

edited

Loading

zroubalik commented Mar 30, 2020

grantr commented Mar 30, 2020 •

edited

Loading

matzew commented Mar 31, 2020

zroubalik commented Mar 31, 2020 •

edited

Loading

aslom commented Mar 31, 2020

aslom commented Mar 31, 2020

aslom commented Apr 2, 2020

markfisher commented Apr 2, 2020

scothis commented Apr 2, 2020

slinkydeveloper commented Apr 3, 2020 •

edited

Loading

markfisher commented Apr 3, 2020

slinkydeveloper commented Apr 6, 2020

zroubalik commented May 6, 2020

github-actions bot commented Nov 25, 2020

Channels: Implement scale subresource #2864

Channels: Implement scale subresource #2864

Comments

zroubalik commented Mar 30, 2020 • edited Loading

zroubalik commented Mar 30, 2020

grantr commented Mar 30, 2020 • edited Loading

matzew commented Mar 31, 2020

zroubalik commented Mar 31, 2020 • edited Loading

aslom commented Mar 31, 2020

aslom commented Mar 31, 2020

aslom commented Apr 2, 2020

markfisher commented Apr 2, 2020

scothis commented Apr 2, 2020

slinkydeveloper commented Apr 3, 2020 • edited Loading

markfisher commented Apr 3, 2020

slinkydeveloper commented Apr 6, 2020

zroubalik commented May 6, 2020

github-actions bot commented Nov 25, 2020

zroubalik commented Mar 30, 2020 •

edited

Loading

grantr commented Mar 30, 2020 •

edited

Loading

zroubalik commented Mar 31, 2020 •

edited

Loading

slinkydeveloper commented Apr 3, 2020 •

edited

Loading