-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add feature-flagged support for multi-threaded Tokio runtime #611
Conversation
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This reverts commit c033a95. Turns out the metrics overhead is not too worrying.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description seems to indicate that this adds a feature flag but I don't think it does?
if num_cpus > 2 { | ||
builder | ||
.threaded_scheduler() | ||
.core_threads(num_cpus - 1) // Save 1 core for the admin runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should dedicate an entire core to the admin thread... That's a lot of resources for something we shouldn't be doing much work on. The real goal of the separate admin runtime is to ensure we can still interact with the proxy if, for instance, the main runtime isn't healthy. I think we're best off keeping it simple and treating it as if the admin thread needs no dedicated resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, yeah, that makes sense. My thought process was that if forwarding has more workers, load on the admin runtime for e.g. discovery & DNS lookups might increase. I think the long term solution might be to just move that work off of the admin runtime, and have it only run metrics/tap/trace endpoints/etc. Then we can probably feel confident not giving it a whole core.
Edit: whoops, looks like service discovery lookups are already run on the main thread, must have misremembered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nevermind I see that it is feature-flagged. We can merge and roll forward with tweaks.
This branch enables feature-flagged support for using Tokio's multithreaded runtime when the proxy is given more than two CPU cores to run on. The proxy's understanding of the system's number of CPUs reflects cgroups limits, so docker CPU limits are taken into consideration. If only a single core is available to run a worker on, we will construct Tokio's basic_scheduler, rather than a threaded_scheduler with a single worker thread. This is because there is some overhead introduced by the threaded scheduler, which results in worse performance when only 1 core is available. Finally, a bug in the handle time metrics layer results in a deadlock when used with the multithreaded runtime. Therefore, I've disabled this metric for now --- my understanding is that nothing consumes it at the moment. If it's needed, we can fix the bug and put it back.
This release enables a multi-threaded runtime. Previously, the proxy would only ever use a single thread for data plane processing; now, when the proxy is allocated more than 1 CPU share, the proxy allocates a thread per available CPU. This has shown substantial latency improvements in benchmarks, especially when the proxy is serving requests for many concurrent connections. --- * Add a `multicore` feature flag (linkerd/linkerd2-proxy#611) * Add `multicore` to default features (linkerd/linkerd2-proxy#612) * admin: add an endpoint to dump spawned Tokio tasks (linkerd/linkerd2-proxy#595) * trace: roll `tracing` and `tracing-subscriber` dependencies (linkerd/linkerd2-proxy#615) * stack: Add NewService::into_make_service (linkerd/linkerd2-proxy#618) * trace: tweak tracing & test support for the multithreaded runtime (linkerd/linkerd2-proxy#616) * Make FailFast cloneable (linkerd/linkerd2-proxy#617) * Move HTTP detection & server into linkerd2_proxy_http (linkerd/linkerd2-proxy#619) * Mark tap integration tests as flakey (linkerd/linkerd2-proxy#621) * Introduce a SkipDetect layer to preempt detection (linkerd/linkerd2-proxy#620)
This release enables a multi-threaded runtime. Previously, the proxy would only ever use a single thread for data plane processing; now, when the proxy is allocated more than 1 CPU share, the proxy allocates a thread per available CPU. This has shown substantial latency improvements in benchmarks, especially when the proxy is serving requests for many concurrent connections. --- * Add a `multicore` feature flag (linkerd/linkerd2-proxy#611) * Add `multicore` to default features (linkerd/linkerd2-proxy#612) * admin: add an endpoint to dump spawned Tokio tasks (linkerd/linkerd2-proxy#595) * trace: roll `tracing` and `tracing-subscriber` dependencies (linkerd/linkerd2-proxy#615) * stack: Add NewService::into_make_service (linkerd/linkerd2-proxy#618) * trace: tweak tracing & test support for the multithreaded runtime (linkerd/linkerd2-proxy#616) * Make FailFast cloneable (linkerd/linkerd2-proxy#617) * Move HTTP detection & server into linkerd2_proxy_http (linkerd/linkerd2-proxy#619) * Mark tap integration tests as flakey (linkerd/linkerd2-proxy#621) * Introduce a SkipDetect layer to preempt detection (linkerd/linkerd2-proxy#620)
This branch enables feature-flagged support for using Tokio's
multithreaded runtime when the proxy is given more than two CPU cores to
run on. The proxy's understanding of the system's number of CPUs
reflects cgroups limits, so docker CPU limits are taken into
consideration.
The multithreaded runtime offers significantly better performance under
load when multiple CPU cores are available. Here's benchmark results
from a run on my 24-thread Ryzen 3900X system, where the proxy, the load
generator, and the test server were all given 8 cores:
Note that the locking in multiple metrics layers doesn't really seem
to have a big impact, as shown by the run with metrics disabled. There
might be some improvement in high tails from further optimizations to
the metrics code, though.
If only a single core is available to run a worker on, we will construct
Tokio's
basic_scheduler
, rather than athreaded_scheduler
with asingle worker thread. This is because there is some overhead introduced
by the threaded scheduler, which results in worse performance when only
1 core is available:
I'd like to fix this upstream in Tokio, by having
Runtime::new()
andthe macros intelligently select the basic scheduler when only 1 core
is available, but this works for now.
Finally, a bug in the handle time metrics layer results in a deadlock
when used with the multithreaded runtime. Therefore, I've disabled this
metric for now --- my understanding is that nothing consumes it at the
moment. If it's needed, we can fix the bug and put it back.