Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement the Lock middleware with tokio::sync #427

Merged
merged 5 commits into from
Feb 17, 2020
Merged

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Feb 17, 2020

As described in tokio-rs/tokio#2237, the tokio::sync::Semaphore can
hold unbounded memory, especially when the semaphor is being contested
and consumers drop interest. Unfortunately, this use case is common in
the proxy, especially when a destination service is unavailable and the
proxy is timing out requests.

This change reimplements the Lock middleware without using
tokio::sync::Semaphore. This implementation is in some ways more
naive and inefficient, but it appears to be better suited for the
proxy's needs. Specifically, waiters are stored in a LIFO stack, which
optimizes for minimizing latency. Under certain high-load scenarios,
this Lock could be forced to grow its waiters set without cleaning up
expired watchers. If this becomes a more serious concern, we could
change the implementation to use a FIFO queue of waiters.

As described in tokio-rs/tokio#2237, the `tokio::sync::Semaphore` can
hold unbounded memory, especially when the semaphor is being contested
and consumers drop interest. Unfortunately, this use case is common in
the proxy, especially when a destination service is unavailable and the
proxy is timing out requests.

This change reimplements the Lock middleware without using
`tokio::sync::Semaphore`. This implementation is in some ways more
naive and inefficient, but it appears to be better suited for the
proxy's needs. Specifically, waiters are stored in a LIFO stack, which
optimizes for minimizing latency. Under certain high-load scenarios,
this Lock could be forced to grow its waiters set without cleaning up
expired watchers. If this becomes a more serious concern, we could
change the implementation to use a FIFO queue of waiters.
@olix0r olix0r requested a review from a team February 17, 2020 18:12
@olix0r
Copy link
Member Author

olix0r commented Feb 17, 2020

Though the lock isn't yet used in the proxy, I've tested another branch that uses the lock and confirmed that we don't see the unbounded memory growth observed with the previous version.

linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/shared.rs Outdated Show resolved Hide resolved
linkerd/lock/src/shared.rs Outdated Show resolved Hide resolved
@olix0r olix0r requested review from a team and seanmonstar February 17, 2020 19:20
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great. The code is very clear and well-documented, so it was easy to review — thanks for that!

I had one big question about the potential to accumulate "dead" waiter pointers towards the back of the stack, especially under heavy load. Besides that, most of my comments were nits or typos; I feel pretty good about the approach here.

Since using a LIFO stack lets us bias for latency rather than fairness, I wonder if we might want to continue using this implementation in this specific use-case (service concurrency control) even after tokio_sync's memory leak is fixed, especially since this implementation is also much simpler. Might be worth some benchmarking!


impl std::fmt::Display for Poisoned {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "poisoned")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nit: i believe the write! macro can have a small amount of overhead vs Formatter::write_str when there's no string interpolation to perform. this probably doesn't matter in practice...

linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/shared.rs Outdated Show resolved Hide resolved
}

#[test]
fn fuzz() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to consider using loom for this test. loom 0.1.x has a mock version of AtomicTask, which can simulate spurious wakeups, and a mock version of std::thread, which will run every permutation of threads being preempted (with bounding).

Using loom rather than randomness would make this task deterministic, which could make debugging failures significantly easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds cool! I'd rather not block the PR on this unless we have specific concerns.

Comment on lines +88 to +93
while let Some(waiter) = self.waiters.pop() {
if waiter.notify() {
trace!("Notified waiter");
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm correctly understanding this strategy, I think we may still be potentially susceptible to some memory leakiness when a very large number of waiters are in the stack. Since this is a LIFO, if there are a lot of waiters on the lock, those towards the end of the stack may never be woken up. If the delay is long enough, those waiters will time out, causing the futures to be dropped and thus dropping the Arc-ed waker, but the Weak pointer will remain in the stack. If a lot of new waiters are always being pushed, those Weaks will remain at the back of the stack forever.

Since we're dropping the actual waker when a future is dropped, the size of the data remaining in the stack per waiter is significantly smaller than in tokio_sync's Semaphore (I believe a Weak is just a single pointer). However, it is still possible for the vec to have unbounded growth.

One thought is that we may want to periodically call Vec::retain or something, to drop any Notify instances in the stack where the waiter has been dropped. However, this is expensive — it requires iterating the whole vec — so for latency reasons, we probably don't want to do this when acquiring or releasing the lock. We could probably consider putting this work in a background task that will run when the lock is not busy?

We may also want to periodically call shrink_to_fit so that the actual allocated capacity of the vec is released when load decreases enough that we don't need as much capacity. However, we definitely don't want to do this too often, as we could essentially thrash allocations. If we're occasionally cleaning out dead waiters, vec capacity is probably much less of an issue...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. This is all correct. My current assumption is that what we have is Good Enough. We could certainly try to address these issues, but solutions all come with complexity. We can only really hit this situation when the lock is always under contention. As soon as it becomes idle, the entire set of waiters will be cleaned up. It seems like it would be unlikely to encounter this in practice, though I'm sure that we could write a test that triggers this...

For now, I'd prefer to add a note about this and revisit if we see this type of usage in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as it becomes idle, the entire set of waiters will be cleaned up

Where does this happen? I must've missed it.

Copy link
Contributor

@hawkw hawkw Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean that if the lock is released when there are no "live" waiters in the stack, we continue popping until we hit a "live" waiter, so if there are none, we'll pop everything. Yeah, that means we probably won't see issues around the stack's capacity except in extreme edge cases. 👍

(it might be nice if there was a comment pointing that out...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note onto the Shared docstring.

linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/shared.rs Outdated Show resolved Hide resolved
linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given https://github.com/linkerd/linkerd2-proxy/pull/427/files#r380353352, I feel pretty good about the design of the lock used here, at least in the proxy's use-case!

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I agree that this implementation is clear to follow. From reading the conversations above I don't have any major concerns about the strategy.

linkerd/lock/src/service.rs Outdated Show resolved Hide resolved
Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read this and understood it and it makes sense to me.

I believe there is an implicit contract that if you call Shared::poll_acquire then you must call poll_acquire again when notified or call release_waiter to release your interest. Lock does this correctly by calling release_waiter in the Drop impl, but it would be helpful to have these contacts called out explicitly. Failing to do so could cause deadlock.

Similarly, if you successfully acquire a value, you must eventually call release_and_notify. Dropping the acquired value will lead to deadlock as well. That's probably obvious but it's still something I had to mentally work through.

@olix0r olix0r merged commit b2d3ed8 into master Feb 17, 2020
@olix0r olix0r deleted the ver/lock-rewrite branch February 17, 2020 21:54
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 19, 2020
This release includes the results from continued profiling & performance
analysis. In addition to modifying internals to prevent unwarranted
memory growth, we've introduced new metrics to aid in debugging and
diagnostics: a new `request_errors_total` metric exposes the number of
requests that receive synthesized responses due to proxy errors; and a
suite of `stack_*` metrics expose proxy internals that can help us
identify unexpected behavior.

---

* trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426)
* Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427)
* Add the request_errors_total metric (linkerd/linkerd2-proxy#417)
* Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428)
* concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429)
* profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406)
* http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430)
* lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 19, 2020
This release includes the results from continued profiling & performance
analysis. In addition to modifying internals to prevent unwarranted
memory growth, we've introduced new metrics to aid in debugging and
diagnostics: a new `request_errors_total` metric exposes the number of
requests that receive synthesized responses due to proxy errors; and a
suite of `stack_*` metrics expose proxy internals that can help us
identify unexpected behavior.

---

* trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426)
* Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427)
* Add the request_errors_total metric (linkerd/linkerd2-proxy#417)
* Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428)
* concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429)
* profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406)
* http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430)
* lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants