Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evented I/O introduces multithreading into the control plane, not only the data plane #5962

Open
jorangreef opened this issue Jul 31, 2020 · 9 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@jorangreef
Copy link
Sponsor Contributor

I watched "Live Coding a Basic Chat Server in Zig with Evented I/O" again today and at https://youtu.be/aDd0BexKWps?t=2991 (49:51 into the stream) there is a discussion on how using evented I/O can also make your control plane (and not only your I/O data plane) multithreaded and introduce data races.

I found this surprising behavior because evented I/O as an abstract concept does not in itself force the control plane to also be multithreaded, and most evented I/O systems use a single-threaded control plane for safety.

There's a subtle difference but I would prefer keeping the event loop control plane single-threaded, while handing off to the threadpool only for I/O that could block.

The reason this matters is because Zig's async/await (which is awesome) is supposed to enable the same code to work in different I/O modes, however multithreading in the control plane introduces a "color" in the form of H.B.D. data race safety.

Of course, if the user wants to run their own functions in the threadpool then that introduces multithreading, but it should be possible to opt-in to evented I/O without introducing data races?

@pedromsilvapt
Copy link

I'm fairly new to zig and I don't think I have fully grasped its implementation of async/await yet, so what I'm thinking may be complete nonsense. But it did strike me as odd that any function call change what thread the code runs on once it resumes (such as calling try server.accept()). Some languages do this (I think C# does) but only on awaitable functions, meaning there is at least an explicit marker (the await keyword) on the calling side that a different thread might resume.

Would it be feasible to choose this behavior for zig: async functions are always resumed on the same thread they were called. However if the user explicitly marked them as detached (like try detach server.accept()) or something, they could be resumed on the same thread or on a different one. I know introducing a new keyword might not be desirable, but this is just to get the point across (the implementation could be different).

This is not exactly the same as running the event loop in single threaded mode. We would still be able to run async functions on different threads, but would have to be explicit when such could happen.

I haven't thought much about this yet, so would be interested to hear more feedback on this honestly.

@andrewrk
Copy link
Member

andrewrk commented Aug 2, 2020

In Zig, the decision to use single or multi-threaded belongs to the application. It's even a compiler-recognized flag - --single-threaded which is exposed as a comptime bool to any code that wants to check the mode. The default start code which sets up an event loop already respects this flag for evented I/O, setting up a single-threaded event loop when single-threaded mode is used.

Libraries which wish to be maximally portable should handle both the single-threaded and multi-threaded use case. However it is possible for libraries which do not support multi-threading to emit a compile error for multi-threaded builds.

Some API abstractions have this built in automatically; for example if you use std.Mutex to protect a variable which may be accessed from multiple threads, it turns into an empty struct with no-ops when you use it in a --single-threaded build.

Likewise, threadlocal variables are emitted simply as normal global variables in --single-threaded builds.

@pedromsilvapt
Copy link

Hi Andrew, thanks for replying.

Sorry if I'm not fully grokking it but, like you said, it is an application wide switch: either the entire event loop is single threaded or multi-threaded. That's fine. My question was, could we force (pin) a function to execute on the same thread (even if other functions were running on different threads at the same time)? Or better yet, do the opposite: require the programmer to explicitly state when a function call could resume on a different thread? So that when suspending, by default, the event loop would schedule the function to be resumed on the same thread unless told to do otherwise?

It's not exactly the same thing, and I hope you don't mind the comparison, but the language Crystal implements fibers (which are a different thing from zig's async functions, I know). And when spawning a new fiber, the user can optionally ask for it to run on the same thread or not. That's sort of what I was thinking about for zig too.

My fear is that implicitly resuming functions on a different thread could introduce new types of bugs. Maybe this worry is unwarranted. The more I think about it, the more I suspect that it may well be.

@andrewrk
Copy link
Member

andrewrk commented Aug 2, 2020

could we force (pin) a function to execute on the same thread (even if other functions were running on different threads at the same time)? Or better yet, do the opposite: require the programmer to explicitly state when a function call could resume on a different thread? So that when suspending, by default, the event loop would schedule the function to be resumed on the same thread unless told to do otherwise?

The language makes all of these features possible. It is a matter of an event loop's API and implementation to provide them. The standard library event loop currently does not support any of these things yet.

@jorangreef
Copy link
Sponsor Contributor Author

jorangreef commented Aug 3, 2020

The default start code which sets up an event loop already respects this flag for evented I/O, setting up a single-threaded event loop when single-threaded mode is used.

Thanks @andrewrk, just to clarify that you can run the standard library event loop with a single-threaded event loop control plane and a multi-threaded data plane?

@jorangreef
Copy link
Sponsor Contributor Author

jorangreef commented Aug 4, 2020

It's 100% fine if the standard library event loop can't yet run a single-threaded control plane with a multi-threaded data plane, but hopefully it can be fixed down the line because it's the common case of event loop deployments I think.

When I saw that comment in the video stream linked above, I was absolutely sure you were going to make a follow-on comment next about how this was only a bug in the event loop implementation that was still going to be sorted out. It took me completely by surprise when I realized what was actually happening with multi-threading leaking through the event loop abstraction.

@noisesmith
Copy link

noisesmith commented Sep 16, 2020

I had similar thoughts after watching that presentation, and I hope this is a reasonable place to follow up.

The concurrency issues around the shared but not synchronized hash map in particular give me pause. A powerful concurrent design pattern is to move all usage of a given data structure into one flow of control (whether that's a "thread" on the os / process level or just a continuity of control flow that the compiler can guarantee).

My (half baked) proposal: a single async function frame that owns all writes of the data structure (here the clients in the room, in a hash-map of client/null k/v pairs), said data structure designed so that modifications are not visible to the handle that is being read (insertions return new handles and don't invalidate old ones). This plus a concurrency-safe fifo queue (locked, but guaranteed to return quickly from locked operations), let's call it a channel. A channel allows "putting" a new client, and the owning thread "takes" that client and puts it into a concurrently readable hash map. The same channel allows "putting" a second channel, and responds by putting a const client list onto the channel (something that won't be modified even if more clients are added to the base hash).

The elements underlying this design are reusable, they are fast at runtime, and simple enough to warrant language library support. Especially given the classes of async bugs that are ruled out (or turned into compilation errors). It could be implemented as two features: a thread safe fifo queue (prior art of this should be easy to find), and a persistent hash-map data structure. Existing HAMT implementations of this are open source and worth checkout out. Additionally we probably want some tight compile time restrictions on the const-ness of the data put onto the queue and the data used to key the HAMT.

Perhaps, to blend with the async APIs that make it useful, the queue could be exposed as an async function frame that allows await of a next readable message / optional await of message delivery confirmation.

The downside on HAMT in particular is that it's memory hungry (in exchange for speed), but it's less memory hungry (not to mention significantly faster) than the naiive approach of shallow copying the data.

I could be nostalgic for some favorite Clojure features here, but I think this is narrow enough in scope, and the complexity / benefit ratio here is good.

If these two items (the queue and concurrency safe hash) are not worthy of zig itself, I could still implement a library, of course. But I think these are proven concepts in concurrent application design with good follow on effects if present and easy to use in core.

@Rocknest
Copy link
Contributor

Maybe a solution would be something like this

fn onDataIncoming(ctx: App, conn: Connection) void {
    if (mem.eql(conn.peek(3), "MSG") {
        const msg = conn.skip(3).readAll();
        startControlOperation();
        defer endControlOperation();
        for (ctx.users) |user| {
            async user.sendMsg(msg);
        }
    } else if (mem.eql(conn.peek(5), "CLOSE")) {
        conn.skip(5);
        startControlOperation();
        defer endControlOperation();
        ctx.removeUser(conn);
    }
}

/// Resumes the frame on the control thread (UI thread etc.)
fn startControlOperation() callconv(.Async) void;

/// Allows the frame to be resumed on any other thread(s)
fn endControlOperation() callconv(.Async) void;

@andrewrk andrewrk added this to the 0.8.0 milestone Oct 13, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 Nov 6, 2020
@kprotty
Copy link
Member

kprotty commented Jan 28, 2021

@noisesmith @jorangreef It has been a few months, but I wanted to chip in as well.

The current event loop is one which just schedules async frames to be resumed on a pool of threads. This is pretty standard when it comes to parallel async implementations (Go, Rust tokio/async-std, Erlang, Akka, etc.). Many of those listed also provide some configuration options such as tuning how many threads are in the pool. One particular option is being able to bind/unbind a given asynchronous scope to one thread when necessary to avoid being distributed among the threads in the pool. This is entirely feasible, but thread binding along with pool size config is just not currently implemented yet in std.event.Loop. There would be nothing stopping it theoretically for doing so. There are arguments against doing this however.

For pool size config, one could say that the implementation knows best how many threads to spawn to justify the current API. This is generally how some thread pool implementations like Apple's GrandCentralDispatch/libdispatch already operate under. Its disadvantageous for those who know more about the system and their needs, but still technically advantageous due to the pool being very familiar with the OS and utilizing priorities and internal apis to offer efficient execution.

For thread binding, this is effectively the same as setting CPU core affinity when using OS threads as an analogy for async tasks. Its undoubtfully useful (especially in the presence of threadlocal state), but there's always a downside. When implementing async thread pools, having the ability for a task to the scheduled to a "thread" instead of the "pool" can complicate the internals and force runtime overheads to be introduced that otherwise wouldn't be necessary due to the granularity at which scheduling is performed. Basically, it can make the entire event Loop slower on average.

Its useful to remember that Zig async is only there to facilitate writing non-blocking code easy. Whether its executed serially or in parallel, its still concurrent either way. This is also true in environments like Go and Rust, where most async code is written in a way that is agnostic to its parallel ability, as its concurrent ability already assumes such. If you want one which has serial semantics for an async task, or if you want one that distributes them to threads efficiently, either is possible to implement. I think the question then falls down to which one the stdlib should offer for those who don't want to implement their own for various valid reasons.

@SpexGuy SpexGuy added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library. labels Mar 21, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

7 participants