Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Cap scaling increase/decrease by X% of existing parallelism #1422

Open
billonahill opened this issue Sep 23, 2016 · 8 comments
Open

Cap scaling increase/decrease by X% of existing parallelism #1422

billonahill opened this issue Sep 23, 2016 · 8 comments

Comments

@billonahill
Copy link
Contributor

While testing I almost increased parallelism by an order of magnitude more than intended when I had a typo in my command. We should put some guardrails in place to limit just how much we can scale in one shot.

Relates to #1292.

@wangli1426
Copy link
Contributor

Hi @billonahill,
This feature is very interesting. Actually I have done something similar thing on Apache Storm, to adjust the parallelism of a bolt according to its instantaneous workload. If you are interested, we may have discussion for more details.

Thanks.

@billonahill
Copy link
Contributor Author

@wangli1426 this ticket is related to a user manually issuing a parallelism change using the heron update command. Are you referring to auto-scaling functionality, where component parallelism is dynamically adjusted based on current load? I'd be interested in learning the approach used to algorithmically determine that a component should be scaled up or down. This is something we plan to tackle.

@wangli1426
Copy link
Contributor

wangli1426 commented Oct 17, 2016

Hi @billonahill,

Yes, I was referring to the auto-scaling functionalities, which consists of two parts: (1) the mechanism achieving scaling operators and (2) the algorithm to determine the optimal parallelism of the operators.

For (1), to support scaling of stateful operators, we model the operator state as key-value pairs. And the scaling is done by re-partitioning the operator state. I will send you our paper about run-time operator scaling if you are interested. I am wondering how you plan to deal with operator state when scaling up or down. Will you store the operator state to a persistent store before scaling? Will you consider live scaling without deactivating the current topology?

For (2), we have a paper that utilizes the Queueing Theory to determine the optimal parallelism of each operator given the user-defined tuple processing latency constraint. The first step of the algorithm is to reason about the minimal parallelism of each operator to avoid being the performance bottleneck. This is can be easily computed by $\lambda/u$, where $\lambda$ and $u$ is the arrival rate and the operator currently processing rate respectively.

I have implemented the auto-scaling functionality based on Storm. And I am willing to make contributions to Heron on this feature.

Thanks.
Li

@kramasamy
Copy link
Contributor

@wangli1426 - this is definitely interesting. @billonahill and @avflor - can you please take a look at this proposal?

@avflor
Copy link
Contributor

avflor commented Oct 17, 2016

@kramasamy Sure, I can definitely read the paper an provide feedback.
@wangli1426 Thanks for pointing us to the paper.

@wangli1426
Copy link
Contributor

@avflor You are welcome.

@billonahill
Copy link
Contributor Author

Thanks @wangli1426. For (1) we currently do not provide guarantees about local state during scaling events. This is something we'd like to tackle though as a general effort to provide stateful durability. It would be useful during scaling but also during routine failures.

For (2) we'll certainly check out that paper. Let's move further discussion on that topic to #1389, which tracks the auto-scalding algorithmic work.

@wangli1426
Copy link
Contributor

Hi @billonahill @kramasamy @avflor,

I proposed a new operator for live scaling in #1499, please review.

Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants