Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channels: Implement scale subresource #2864

Closed
zroubalik opened this issue Mar 30, 2020 · 14 comments
Closed

Channels: Implement scale subresource #2864

zroubalik opened this issue Mar 30, 2020 · 14 comments
Labels
area/channels kind/feature-request lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@zroubalik
Copy link
Contributor

zroubalik commented Mar 30, 2020

Problem
Scale subresource provides unified kubernetes way on scaling resources.

KEDA is currently planning to support scaling of any Resource that has scale subresource. If we define scale subresource on Channel, we will able to easily plug KEDA in for autoscaling, or eventually any other tool that support scaling on scale subresource. It gives flexibility for users to choose any autoscaler they want to, as it don't neccessary bring hard dependency on KEDA.

Persona:
System Integrator

Exit Criteria
Channel and at least one specific implementation has scale subresource

Time Estimate (optional):
How many developer-days do you think this may take to resolve?
weeks

Additional context (optional)
#2154

@zroubalik
Copy link
Contributor Author

@aslom @matzew @n3wscott you might be interested in this^

@grantr grantr added area/channels priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 30, 2020
@grantr
Copy link
Contributor

grantr commented Mar 30, 2020

Let's discuss this at the next eventing WG meeting.

@matzew
Copy link
Member

matzew commented Mar 31, 2020

On the surface this sounds better than the previously proposed approach with annotations, that are less or more directly tied to an implementation.

So, yeah - I am for a more generic approach here, where KEDA is an impl. details

👍

@zroubalik
Copy link
Contributor Author

zroubalik commented Mar 31, 2020

With this you will have scale endpoint on the Channel, which could be targeted by KEDA or any other autoscaling tool or even with kubectl scale channel-foo.

To enable autoscaling you will still need the integration part, something that creates the resource (KEDA's ScaledObject). Because it tells KEDA what metrics it should consume, what is min/max replicas count etc.

To support another autoscaling tool, similar properties will need to be exposed (ie. create some CR with the metadata).

@aslom
Copy link
Member

aslom commented Mar 31, 2020

Annotations are used in Knative Serving so would be good to use them as well in Knative Eventing?

Last time we discussed it in WG we decided to define the goals - here is public google doc with initial notes from the meeting: https://docs.google.com/document/d/1usNmsuHBWzVaL5GGC873iGVrkKXGbc6t7bLHJL38Cyg/edit?usp=sharing (anybody should be able to edit it)

@aslom
Copy link
Member

aslom commented Mar 31, 2020

@zroubalik @grantr @matzew added to delivery WG agenda

@aslom
Copy link
Member

aslom commented Apr 2, 2020

Just as a clarification when talk about scaling channels we want to scale part it is doing delivery of events (take from form event source and dispatches over HTTP to subscribers) - that means that we may expose part of channel as Scale subresource?

@markfisher
Copy link
Contributor

I understand the general idea that having a Scale subresource on the Channel would leave it up to each Channel implementation to determine what "internally-managed" resources (e.g. Deployments) need to scale.

However, taking the Kafka channel as an example, the underlying Deployments are multi-tenant from a consumer group perspective, while the metrics that should drive scaling must be per consumer-group (that's what lag means in Kafka: each consumer group's consumer-offset relative to the end-position on each partition).

To effectively manage scaling, perhaps the Kafka Channel should be refactored so that there is a 0-N (where N=num-partitions) pool of dispatcher deployments per consumer group. But even then it's not possible to manage the per consumer group scaling with a single Scale subresource for the Channel.

@scothis
Copy link
Contributor

scothis commented Apr 2, 2020

The scale subresource enables getting/setting the number of replicas for the resource. What does it mean for a channel to have replicas? How does a scale of 1 vs 2 manifest itself? For the scale subresource to work with HPA a label selector must be defined, what is it selecting?

Channels are logical constructs, not physical, and don't lend themselves to scale the same way a Deployment does. It seem that what you want to scale is the dispatcher, not the channel.

@slinkydeveloper
Copy link
Contributor

slinkydeveloper commented Apr 3, 2020

I have two concerns about scaling the actual KafkaChannel implementation i want to share, I'm not sure if they apply to the general discussion here (for sure they apply to IMC too):

  • With the actual KafkaChannel architecture (more channels handled by the same dispatcher) scaling it as a black box can be more harmful than useful. If you scale it as is know and you have a channel hammered with requests, every dispatcher instance will handle every channel, just doubling the resource consumptions, so the hammered channel is just "distributed" across two instances, without potentially solving the problem. IMO we should first need to figure out how to make the scaling works distributing channels across several dispatchers: every dispatcher instance should agree with the others on what channel it wants to handle. This also implies that this mechanism should be able to split the work based on metrics of the channels, in order to fairly (based on the load) share the various channels among the instances. That looks really complicated to me. The other alternative would be to drop the "one dispatcher for all" and have a dispatcher per channel, in order to scale them granularly. Since the consumer traffic is tied to the amount of received events, it looks ok-ish to scale these dispatchers as black boxes. Even better, every dispatcher should be split in "inbound" part and "outbound", in order to scale them even more granularly (even if, at the end, the traffic is controller by the amount of received events)
  • Scaling for KafkaChannel has an upper limit and we need some way to represent it. The upper limit is sum of number of partitions for each handled topic. If you scale after that number, you get ghost instances that does nothing.

@markfisher
Copy link
Contributor

@slinkydeveloper

every dispatcher instance should agree with the others on what channel it wants to handle

do you mean "what subscription it wants to handle"?

and....

dispatcher per channel

"dispatcher per subscription"?

@slinkydeveloper
Copy link
Contributor

do you mean "what subscription it wants to handle"?

I suppose it makes sense to agree on what channel it wants to agree in our actual architecture

"dispatcher per subscription"?

Dispatcher per subscription is the ultimate goal i think

@zroubalik
Copy link
Contributor Author

Thank you all for the great input, there are difinitely things I was missing before :)

@grantr grantr added this to the Backlog milestone Aug 24, 2020
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/channels kind/feature-request lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

7 participants