Prevent concurrent topology parallelism changes #1353

billonahill · 2016-09-07T23:51:09Z

We need a way to assure that multiple scaling events don't collide.

Related to #1292.

billonahill · 2016-09-14T20:48:42Z

Approach 1 - local lock with current packing plan comparison

Client fetches current packing plan from state manager and sends it and proposed changes to scheduler when invoking update.
UpdateTopologyManager receives request and takes out a local lock. If lock can't be obtained, returns a concurrent update exception response.
With the local lock, UpdateTopologyManager compares the current packing plan with what's in the state manager. This is to confirm that the system state hasn't changed since the request was initiated by another agent. If the packing plans don't match, returns a concurrent update exception response.
UpdateTopologyManager updates the topology info in the state manager to invoke the scaling change.
UpdateTopologyManager releases the local lock and returns success.

Approach 2 - local lock with update request versioning

This approach is similar to approach 1, except an atomically incremented request id is used instead of the current packing plan comparison.

Client atomically increments an updateRequestId counter in state manager.
Client submits update request including the updateRequestId.
UpdateTopologyManager receives request and takes out a local lock. If lock can't be obtained, returns a concurrent update exception response.
With the local lock, UpdateTopologyManager compares the updateRequestId with the updateRequestId in the state manager. This is to confirm that the system state hasn't changed since the request was initiated by another agent. If the updateRequestIds don't match, returns a concurrent update exception response.
Proceed as describe in Approach 1.

avflor · 2016-09-14T20:55:25Z

@billonahill I prefer the second approach since it allows us to do some ordering in the update requests if needed. So the update topology manager can detect if a particular request is out of order (the current updateRequestId is much greater than the previous one it processed). I'm not sure if this is useful though. Just thinking.

billonahill · 2016-09-14T21:05:45Z

If we don't need to keep track of the last successfully handled updateRequestId that simplifies things, so I was thinking that if we're not handling the current updateRequestId in state manager, we fail. It's more aggressive in that we might fail even when handling the "next in line", but it's simple to implement and rationalize. Also it's easy to recover from with another request.

Since these ids are numerically increasing it's tempting to use them to infer ordering, but I recommend we use them just as atomic optimistic locks on the request/response cycle.

If that's the case, it really does make approach 1 and 2 similar, except that 2 requires additional state storage.

billonahill mentioned this issue Sep 7, 2016

Adding LocalScheduler scaling impl with UpdateTopologyManager #1333

Merged

billonahill changed the title ~~Topology parallelism changes should be atomic~~ Prevent concurrent topology parallelism changes Sep 14, 2016

billonahill self-assigned this Sep 14, 2016

billonahill added this to the 0.14.4 milestone Sep 16, 2016

billonahill added the new feature label Sep 16, 2016

This was referenced Sep 30, 2016

Prevent concurrent scaling events from occuring #1459

Merged

Adding distributed locking support #1463

Merged

billonahill closed this as completed in #1459 Oct 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent concurrent topology parallelism changes #1353

Prevent concurrent topology parallelism changes #1353

billonahill commented Sep 7, 2016 •

edited

Loading

billonahill commented Sep 14, 2016

avflor commented Sep 14, 2016

billonahill commented Sep 14, 2016

Prevent concurrent topology parallelism changes #1353

Prevent concurrent topology parallelism changes #1353

Comments

billonahill commented Sep 7, 2016 • edited Loading

billonahill commented Sep 14, 2016

Approach 1 - local lock with current packing plan comparison

Approach 2 - local lock with update request versioning

avflor commented Sep 14, 2016

billonahill commented Sep 14, 2016

billonahill commented Sep 7, 2016 •

edited

Loading