-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prototype] boost fiber instead of asio for async #16699
Conversation
What's blocking us from merging this? I guess
cc @pcmoritz @stephanie-wang for comments |
@@ -40,6 +40,14 @@ namespace rpc { | |||
INVOKE_RPC_CALL(SERVICE, METHOD, request, callback, rpc_client); \ | |||
} | |||
|
|||
#define FIBER_RPC_CLIENT_METHOD(SERVICE, METHOD, rpc_client, SPECS) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this have reduced performance compared to the RPCs run directly on gRPC threads?
How does thread pooling work for fibers, do we have a pool of fiber threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the gRPC thread, it'll move the result out, so it won't reduce the performance.
Comparision:
- Previously, we post a job to io_thread pool. Here a job is created in asio pool.
- Here, we pass the result to promise with move. The extra cost is moving the structure, which is ok. For example, this is the pub sub message, if the arena is the same, it'll just swap and we always use the default one.
inline PubMessage& operator=(PubMessage&& from) noexcept {
if (GetArena() == from.GetArena()) {
if (this != &from) InternalSwap(&from);
} else {
CopyFrom(from);
}
return *this;
}
No, we don't have a pool for fiber thread for this prototype. We just reuse the thread calling CoreWorker which can be optimized later. We should put everything not cpu intensive to one thread (fiber thread) and put cpu intensive to cpu thread pool.
Fiber's thread is like asio thread, the different part is that, for asio, they manage task, which is a closure. For fiber, it also manage context and scheduling of the tasks. In asio, we wrap everything into lambda capture, and we lose call stack. In fiber, it put things into context and makes them resumable.
In the middle of migration, we can share the same thread with asio. I mean if we go this way we can do it granularly.
my major concerns are the team's expertise with fiber.
To summarize my takes on the trade-offs for adopting boost::fiber:
|
Could we get some of the benefits of fibers by adding helper libraries for
common distributed patterns in asio? For example chained futures and
await-all.
…On Tue, Jun 29, 2021, 12:59 PM Yi Cheng ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/ray/rpc/grpc_client.h
<#16699 (comment)>:
> @@ -40,6 +40,14 @@ namespace rpc {
INVOKE_RPC_CALL(SERVICE, METHOD, request, callback, rpc_client); \
}
+#define FIBER_RPC_CLIENT_METHOD(SERVICE, METHOD, rpc_client, SPECS) \
In the gRPC thread, it'll move the result out, so it won't reduce the
performance.
Comparision:
- Previously, we post a job to io_thread pool. Here a job is created
in asio pool.
- Here, we pass the result to promise with move. The extra cost is
*moving* the structure, which is ok. For example, this is the pub sub
message, if the arena is the same, it'll just swap and we always use the
default one.
inline PubMessage& operator=(PubMessage&& from) noexcept {
if (GetArena() == from.GetArena()) {
if (this != &from) InternalSwap(&from);
} else {
CopyFrom(from);
}
return *this;
}
No, we don't have a pool for fiber thread for this prototype. We just
reuse the thread calling CoreWorker which can be optimized later. We should
put everything not cpu intensive to one thread (fiber thread) and put cpu
intensive to cpu thread pool.
Fiber's thread is like asio thread, the different part is that, for asio,
they manage task, which is a closure. For fiber, it also manage context and
scheduling of the tasks. In asio, we wrap everything into lambda capture,
and we lose call stack. In fiber, it put things into context and makes them
resumable.
In the middle of migration, we can share the same thread with asio. I mean
if we go this way we can do it granularly.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#16699 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSXC4NUAMCD63CI35KLTVIQ3JANCNFSM47NI72ZQ>
.
|
@scv119 I read the doc and I don't think the arguments in the doc hold a lot. Go coroutines are also stack full and it gives you benefit and it might suffer from stackoverflow. As for context switch, since we'd like to adopt better thread model, it should be avoided. Besides, for c++20, even the current compiler is not supported well, for adoption it'll take a longer time, maybe 5 years later. I don't treat this one as a solution. Besides, later move from library-based coroutine to compiler supported one is not difficult. I remember you mentioned the code is hard to understand by reading a lot of callbacks right? And also the code flow is much more cleaner compared with the previous one, right? The performance gain comes from avoiding copying things into lambda capture compared with asio, but it's not the key benefit from this. |
@ericl promise/future/then can not give you similar things like this one. Here, everything can be wrapped into one function directly, but with promise future then, basically, we need to divide the big function into parts, like you are chaining functions. Some syntax might look wired, for example, if you have a for loop and you want to decide what to do next with the given result, it's not natural to write with promise/future/then. To build functions around asio to provide promise/future/then, I'd prefer something built around and under maintaining for a long time. |
There are two follow up discussions here which are interesting: |
I've just seen this PR occasionally when I searched GitHub for issues around boost.fiber. I have no much knowledge around ray. But I'm very interested in this discussion. I recently built a Postgres database prototype using boost.fiber, and the backend is multi-threaded using a thread-pool like design, but with fiber inside. The network module is based on Boost.Asio. So far I have not met issues stated in http://www.openstd.org/jtc1/sc22/wg21/docs/papers/2018/p1364r0.pdf . For instance, the paper says the fiber runtime can not integrate with tcmalloc. But at least for now my server didn't crash due to tcmalloc. The debuggability is really an issue. From my experience, application bugs are as easy to be detected as a normal program. But my real concern is that if the library is not widely adopted and stable enough, I may not be able to dig out the library's bug. As I step-by-step look into the gdb call stack of the Boost.Fiber, the code is really complicated to catch up. I can share my fiber pool implementation if you are interested. Furthermore, the paper also stated that the fiber runtime needs the entire project to be refactored and all the blocking API must be wrapped (via promise/future) into fiber-aware API. This is true. But there's approach where different runtimes can coexist: For example, assume now we have a project based on actor model, and when a task is issued, it runs asynchronously and finally gives a result to a callback. To refactor this task with boost.fiber, you can set up a set of isolated threads dedicated for fiber. Then the task is sent to the "fiber-instrumented threads" and execute in synchronous programming style, benefited from fiber. When the task ends, it can send the result back to the "actor-instrumented threads" for finally executing the callback. void RunTask(std::function<void()> task, task_callback) {
FiberAsync([](){
task();
ActorAsync(task_callback);
});
} |
I connected with @neverchanje, and he shared with me a lot of positive experiences using fiber. Apparently, @neverchanje is few steps ahead of us on evaluating boost::fiber, and we will exchange more details of the fiber evaluation. I learned from this conversation that boost::fiber might not yet support Apple M1 chips, as boost::fiber uses assembly to implement its usercontext/user-level stacks. This could affect M1 support of Ray. cc @wuisawesome @simon-mo |
This is interesting, as a data point, we already use boost:fiber for async actor support https://github.com/ray-project/ray/blob/master/src/ray/core_worker/fiber.h. |
@scv119, actually boost:fiber should work in M1. Because boost fiber has assembly for arm and M1 is arm based. We already use boost:fiber, and ray can already compile with arm chip (in graviton and in m1). |
That's great to know :) |
I thought it would require some special customization for Mac arm. Good to know it works. |
Thanks, @neverchanje for the sharing. I feel the only thing that concerns me is the maturity of the library itself which we need to evaluate more. As for debuggability, right now, it's hard to list the user thread stack since it's not integrated with gdb. Only the active ones can be listed. But it's not going to make things worse, since right now. we use ASIO, and it can't list tasks there as well. Btw, we won't have multiple physical threads to execute fiber tasks, only one physical thread for all fiber tasks for simplicity. As for refactoring the entire project for adoption, it's not an issue here as well since our program is already written in async way. We can just add a new API for future/promise and change the async function granually into sync one. The async rise on the gRPC side, and it's not difficult to update it. |
I'll close this PR for now since we've decided to experiment with GCS services. |
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.