-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement the Lock middleware with tokio::sync #427
Conversation
As described in tokio-rs/tokio#2237, the `tokio::sync::Semaphore` can hold unbounded memory, especially when the semaphor is being contested and consumers drop interest. Unfortunately, this use case is common in the proxy, especially when a destination service is unavailable and the proxy is timing out requests. This change reimplements the Lock middleware without using `tokio::sync::Semaphore`. This implementation is in some ways more naive and inefficient, but it appears to be better suited for the proxy's needs. Specifically, waiters are stored in a LIFO stack, which optimizes for minimizing latency. Under certain high-load scenarios, this Lock could be forced to grow its waiters set without cleaning up expired watchers. If this becomes a more serious concern, we could change the implementation to use a FIFO queue of waiters.
Though the lock isn't yet used in the proxy, I've tested another branch that uses the lock and confirmed that we don't see the unbounded memory growth observed with the previous version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks great. The code is very clear and well-documented, so it was easy to review — thanks for that!
I had one big question about the potential to accumulate "dead" waiter pointers towards the back of the stack, especially under heavy load. Besides that, most of my comments were nits or typos; I feel pretty good about the approach here.
Since using a LIFO stack lets us bias for latency rather than fairness, I wonder if we might want to continue using this implementation in this specific use-case (service concurrency control) even after tokio_sync
's memory leak is fixed, especially since this implementation is also much simpler. Might be worth some benchmarking!
|
||
impl std::fmt::Display for Poisoned { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
write!(f, "poisoned") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nit: i believe the write!
macro can have a small amount of overhead vs Formatter::write_str
when there's no string interpolation to perform. this probably doesn't matter in practice...
} | ||
|
||
#[test] | ||
fn fuzz() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to consider using loom
for this test. loom
0.1.x has a mock version of AtomicTask
, which can simulate spurious wakeups, and a mock version of std::thread
, which will run every permutation of threads being preempted (with bounding).
Using loom
rather than randomness would make this task deterministic, which could make debugging failures significantly easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds cool! I'd rather not block the PR on this unless we have specific concerns.
while let Some(waiter) = self.waiters.pop() { | ||
if waiter.notify() { | ||
trace!("Notified waiter"); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm correctly understanding this strategy, I think we may still be potentially susceptible to some memory leakiness when a very large number of waiters are in the stack. Since this is a LIFO, if there are a lot of waiters on the lock, those towards the end of the stack may never be woken up. If the delay is long enough, those waiters will time out, causing the futures to be dropped and thus dropping the Arc
-ed waker, but the Weak
pointer will remain in the stack. If a lot of new waiters are always being pushed, those Weak
s will remain at the back of the stack forever.
Since we're dropping the actual waker when a future is dropped, the size of the data remaining in the stack per waiter is significantly smaller than in tokio_sync
's Semaphore
(I believe a Weak
is just a single pointer). However, it is still possible for the vec to have unbounded growth.
One thought is that we may want to periodically call Vec::retain
or something, to drop any Notify
instances in the stack where the waiter has been dropped. However, this is expensive — it requires iterating the whole vec — so for latency reasons, we probably don't want to do this when acquiring or releasing the lock. We could probably consider putting this work in a background task that will run when the lock is not busy?
We may also want to periodically call shrink_to_fit
so that the actual allocated capacity of the vec is released when load decreases enough that we don't need as much capacity. However, we definitely don't want to do this too often, as we could essentially thrash allocations. If we're occasionally cleaning out dead waiters, vec capacity is probably much less of an issue...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. This is all correct. My current assumption is that what we have is Good Enough. We could certainly try to address these issues, but solutions all come with complexity. We can only really hit this situation when the lock is always under contention. As soon as it becomes idle, the entire set of waiters will be cleaned up. It seems like it would be unlikely to encounter this in practice, though I'm sure that we could write a test that triggers this...
For now, I'd prefer to add a note about this and revisit if we see this type of usage in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as it becomes idle, the entire set of waiters will be cleaned up
Where does this happen? I must've missed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you mean that if the lock is released when there are no "live" waiters in the stack, we continue popping until we hit a "live" waiter, so if there are none, we'll pop everything. Yeah, that means we probably won't see issues around the stack's capacity except in extreme edge cases. 👍
(it might be nice if there was a comment pointing that out...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note onto the Shared
docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given https://github.com/linkerd/linkerd2-proxy/pull/427/files#r380353352, I feel pretty good about the design of the lock used here, at least in the proxy's use-case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I agree that this implementation is clear to follow. From reading the conversations above I don't have any major concerns about the strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have read this and understood it and it makes sense to me.
I believe there is an implicit contract that if you call Shared::poll_acquire
then you must call poll_acquire
again when notified or call release_waiter
to release your interest. Lock
does this correctly by calling release_waiter
in the Drop
impl, but it would be helpful to have these contacts called out explicitly. Failing to do so could cause deadlock.
Similarly, if you successfully acquire a value, you must eventually call release_and_notify
. Dropping the acquired value will lead to deadlock as well. That's probably obvious but it's still something I had to mentally work through.
This release includes the results from continued profiling & performance analysis. In addition to modifying internals to prevent unwarranted memory growth, we've introduced new metrics to aid in debugging and diagnostics: a new `request_errors_total` metric exposes the number of requests that receive synthesized responses due to proxy errors; and a suite of `stack_*` metrics expose proxy internals that can help us identify unexpected behavior. --- * trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426) * Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427) * Add the request_errors_total metric (linkerd/linkerd2-proxy#417) * Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428) * concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429) * profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406) * http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430) * lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
This release includes the results from continued profiling & performance analysis. In addition to modifying internals to prevent unwarranted memory growth, we've introduced new metrics to aid in debugging and diagnostics: a new `request_errors_total` metric exposes the number of requests that receive synthesized responses due to proxy errors; and a suite of `stack_*` metrics expose proxy internals that can help us identify unexpected behavior. --- * trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426) * Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427) * Add the request_errors_total metric (linkerd/linkerd2-proxy#417) * Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428) * concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429) * profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406) * http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430) * lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
As described in tokio-rs/tokio#2237, the
tokio::sync::Semaphore
canhold unbounded memory, especially when the semaphor is being contested
and consumers drop interest. Unfortunately, this use case is common in
the proxy, especially when a destination service is unavailable and the
proxy is timing out requests.
This change reimplements the Lock middleware without using
tokio::sync::Semaphore
. This implementation is in some ways morenaive and inefficient, but it appears to be better suited for the
proxy's needs. Specifically, waiters are stored in a LIFO stack, which
optimizes for minimizing latency. Under certain high-load scenarios,
this Lock could be forced to grow its waiters set without cleaning up
expired watchers. If this becomes a more serious concern, we could
change the implementation to use a FIFO queue of waiters.