Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commands and Reactors #13

Closed
sunfishcode opened this issue Apr 11, 2019 · 20 comments
Closed

Commands and Reactors #13

sunfishcode opened this issue Apr 11, 2019 · 20 comments
Labels
discussion A discussion that doesn't yet have a specific conclusion or actionable proposal.

Comments

@sunfishcode
Copy link
Member

There seem to be two distinct modes of program execution that applications broadly fit into: Commands and Reactors.

  • A Command has a "main" function, and when this function returns the program terminates.
  • A Reactor has a "setup" function, and when this function returns the program remains live, allowing functions registered as callbacks to be called. There is an event loop in the wasm runtime, rather than in the application.

Reactors could run in the main thread of a browser, but they may also have uses in a variety of settings where applications will primarily be responding to external events. Putting the event loop in the runtime gives the runtime the flexiblity to use a single event loop for multiple purposes.

(I briefly mentioned these ideas here, but I want to give them more visibility here.)

Emscripten provides a kind of hybrid approach, where instances can stick around after "main" exits to allow them to be called by callbacks. For WASI, it may be useful to make an explicit static distinction between Commands and Reactors, because:

  • We could limit the APIs available to Reactors. For example, if we want to run Reactors in the main thread of a browser, we could prohibit them from using synchronous I/O APIs.
  • We could limit the APIs available to Commands too, for example not allowing them to register for external event-loop APIs since there is no external event loop in a Command.
  • Allowing runtimes to know that a program is a Reactor up front can enable some useful optimizations, like switching JIT tiers between turns of the event loop, in a very simple way.

Currently, WASI programs combined with clang fit the Command model, with _start being the "main" function. It might make sense to rename it to __wasi_command_main or something. Programs could declare themselves to be Reactors by exporting a function named something like __wasi_reactor_setup or so.

Some interesting questions:

  • Is a static distinction between Command and Reactor too limiting?
  • Are there programs which don't fit into either Command or Reactor models?
  • Are the restrictions imposed by browser main thread execution too limiting for non-Web use cases?
@npmccallum
Copy link
Contributor

Why isn't Reactor just a WASM library used on top of a Command?

@dschuff
Copy link
Member

dschuff commented Apr 12, 2019

You could create such a library, and it could be useful. But I think there's still value in having a reactor-type model explicitly modeled by WASI; in particular because as mentioned the main thread on the web inherently has this model, and its event loop is baked into the platform. That means that if WASI mandates a command-style model, it will be come much more awkward and less useful on the web.

@npmccallum
Copy link
Contributor

As someone who has written such a main loop abstraction, it is very hard to coerce all the platforms an implementations into the same behavior. I think there is negative value in trying to implement a main loop into WASI.

@PoignardAzur
Copy link

PoignardAzur commented Apr 13, 2019

A relevant thought experiment would be the following use case:

How do you handle a grep program?

One the one hand, it definitely has the semantics of a command: it has a main function, reads input files sequentially and prints lines sequentially into its output file descriptor. It's very deterministic and doesn't "react" to anything happening around it.

On the other hand, a wasm grep implementation could definitely benefit from async/await and an event loop. The programs tend to be I/O bound, and async/await allows them to process text while the next block is being loaded in memory, without relying on multithreading or additional logic.

Using the guidelines @sunfishcode suggested above:

  • You don't want a grep-like program to run in the main thread of a browser.

  • You want grep to have access to some asynchronous features. You might still limit its access to the event-loop API, because grep is sequential and usually doesn't need to, say, race between different files and output a line as soon as one of them finds a match.

  • There are probably optimizations to be gained from having semantic information about grep, but I'm not sure which ones. I guess that knowing that it doesn't spawn callbacks make it easier to dispose of?

I'm not really sure what the right semantics are, but I think that grep should still be classified as a command, provided commands have a way to keep executing past blocking syscalls. They could still be more limited than reactors (eg be only allowed one call stack), but I'm not sure what benefits these limitations would actually bring.

@PoignardAzur
Copy link

Also, another question: what effect do you expect the Commands/Reactors distinction to have on the type system and library interoperability?

In the rationale, you propose:

  • In a Reactor, WASI APIs are available, but all functions have an additional argument, which specifies a function to call as a continuation once the I/O completes. This way, we can use the same conceptual APIs, but adapt them to run in an callback-based async environment.
  • In a Command, WASI APIs don't have callback parameters. Whether or not they're non-blocking is an open question (see the previous question).

Which means that every syscall would have two signatures: __wasi_fd_read(fd, buffers) and __wasi_fd_read(fd, buffers, callback).

Wouldn't that split the library ecosystem in two? With every I/O library developer either picking a side, or making a mynetworklib_command and a mynetworklib_reactor version?

I think a better solution might be to have a common function signature for Reactor-mode and Command-mode read: async __wasi_fd_read(...). When compiled in Reactor mode, that function would then get a callback parameter added. In Command mode, it would block, or maybe return a promise-like object that indicates when the read is done.

This would, of course, require async to be added to the type system, with only async functions being able to call other async functions. Additionally, in Command mode, async functions would only be able to use features that don't require an event loop. So, Promise.all and Promise.race can work, but Promise.then can't.

That way, developers could write library that can be used interchangeably in Commands and Reactors. Ideally, you'd also want a __wasi_async_addevent syscall that Commands could shim to provide their own event loop for libraries that do require one.

@dumblob
Copy link

dumblob commented Apr 13, 2019

Somehow I still can't grasp what would be the reason for such (weird) differentiation.

  • We could limit the APIs available to Reactors. For example, if we want to run Reactors in the main thread of a browser, we could prohibit them from using synchronous I/O APIs.

Sounds to me like a disguised attempt to get at least some finer grained capability support than the current coarse grained one - see #1 .

  • We could limit the APIs available to Commands too, for example not allowing them to register for external event-loop APIs since there is no external event loop in a Command.

Again the same - it sounds to me like a disguised attempt to get at least some finer grained capability support than the current coarse grained one - see #1 . @kentonv what do you think?

Allowing runtimes to know that a program is a Reactor up front can enable some useful optimizations, like switching JIT tiers between turns of the event loop, in a very simple way.

Compared e.g. to the approach @npmccallum proposed above, there could be some minor gain, but only for one thread out of N (nowadays from hundreds up to many thousands) and only in case the Reactor runs in the very same thread as the main loop (which is not necessarily the case and is becoming less and less probable over the time as we're having in our pockets HW capable of running tens and hundreds of threads in parallel and software trying to accommodate...).

Also I'm pretty confident, that optimization of switching JIT tiers is possible (and not much more difficult) also without knowing about the distinction between Reactors and Commands in advance in compile time.

But maybe there are other reasons for such distinction and I'm missing them...

@kentonv
Copy link

kentonv commented Apr 15, 2019

@dumblob Hmm, not sure what you mean -- I'm not seeing the connection between this and fine-grained capabilities.

My opinion, FWIW (disclaimer: I don't know a whole lot about WASI and I'm probably missing important context):

In the context of both capability systems and the Web platform, event loops and Promises have "won". async/await has provided a finishing blow by making async code almost as easy to write as sync code, while keeping all the advantages of async. It seems like WASI should focus on async as the main way it expects to be used.

That said, supporting legacy C/C++/etc. code written in synchronous style seems like a desirable goal.

I'd argue for creating only one set of I/O APIs that operates in an async way. All such calls should return a "promise descriptor". A special call, then, would be used to register a callback to call when a promise completes.

In order to support legacy synchronous code, you could then offer a special await call which takes a promise descriptor and actually blocks until it completes. This would actually run the event loop in the meantime, which might mean calling callbacks registered with then. The await call would only be permitted when control entered WASM in an "await-friendly" way. Invoking the main function of a "command"-style program would be defined as await-friendly. Note that this means that "command"-style programs would still have an event loop -- they just control when it runs by using await. I suspect this will be important as command-style programs will likely commonly want to use async-style libraries, which means they need to support event loops in the same way.

You might then consider offering a create_fiber() call which allows an await-friendly context to be created even in "reactor"-style programs. This would work just like Windows fibers or node-fibers. This would permit reactor-style programs to use libraries that were written in a synchronous style. Again, this enables the use of legacy code -- any new code should probably use async/await which makes fibers obsolete.

You might also consider supporting Promise Pipelining: Consider an I/O call which eventually produces a file descriptor as its result, such as open(). It needs to return a promise initially, but then awaiting the promise produces a file descriptor. In these cases, the promise descriptor itself should also be usable as a proxy for the eventual file descriptor, allowing you to pass it to any other async syscall that would work on the eventual file descriptor. So, you can call read() and pass in the promise descriptor returned by open(), without first waiting for that promise to complete. The semantics is: if the promise completes successfully, then the read() operates on the resulting file. If open() ends up failing, then read() fails with the same error. This trick can reduce the number of round trips needed e.g. when the file resides on a remote filesystem. Promise Pipelining is a core feature of Cap'n Proto RPC (derived from E/CapTP); you can read more here: https://capnproto.org/rpc.html

@dumblob
Copy link

dumblob commented Apr 15, 2019

@dumblob Hmm, not sure what you mean -- I'm not seeing the connection between this and fine-grained capabilities.

I quote prohibit them from using synchronous I/O APIs. and for example not allowing them to register for external event-loop APIs - this is not possible with the current fd-oriented capability system - so it sounds to me like "an extension" to the current capability system.

@sunfishcode
Copy link
Member Author

The Reactor concept here is largely motivated by thinking about whether WASI could be used within Web browsers on the main thread, and then, if we design a way to make that work, would it be usable in other contexts as well?

The Web browser main thread environment imposes some constraints, such as the constraint that synchronous I/O is not available (ignoring sync xhr, proxying to Workers, etc.). This is what motivates the suggestion of limiting APIs available to Reactors. And it probably limits the options concerning blocking until an event arrives, or pipelining, or other things.

Maybe having a Reactor concept doesn't mean that all event-based I/O needs to use it, and we still need an epoll-like way for applications to build their own custom event loops. Or perhaps we should have multiple variants of the Reactor concept, to address different use cases.

@kentonv
Copy link

kentonv commented Apr 15, 2019

@sunfishcode FWIW I had the browser main thread use case in mind when writing my comment. I don't think there's any conflict with pipelining...

FWIW, as someone who has built several custom event loops (using poll, epoll, kqueue, Windows IOCP)... I think I would prefer a built-in event loop. All these interfaces are a PITA and all I really want to do is have the OS call some callback when the event happens (but only one callback at a time).

@PoignardAzur
Copy link

@kentonv As much as I like cap'n proto (and I think it has potential as a cross-language layer above wasm, similar to webidl bindings), I don't think Promise pilelining is something the processor should worry about.

Eg an implementation of capn'proto could just have its RPC methods return plain old structs, with both the promise descriptor and the RPC "token" as members. Then wasm would use the promise descriptor, while promise pilelining would use the token.

@titzer
Copy link

titzer commented Apr 16, 2019

@kentonv

Designing APIs is hard and the sync/async debate has a long history. It's hard, there are lots of perspectives, so I don't enjoy making overly definitive statements...nevertheless for brevity, I'll state that synchronous I/O is a fantastically straightforward and successful programming model that has been employed in many, many contexts. Straight-line control flow is as simple as it gets. And blocking I/O follows very naturally from straight-line control flow. Of course, it works best when execution stacks are plentiful and cheap, and task switching is similarly cheap, like many threads in C or many goroutines. Callbacks and promises and async functions are all far more complicated than simple blocking I/O APIs. They haven't "won" so much as they are the only choice for the web platform, which historically has lacked plentiful execution stacks and cheap task switching (and is also tied an inherently single-threaded primary programming language). The web's mostly accidental evolution into a programming platform where UI reactivity, I/O, and computation are multiplexed in userland onto a single underlying execution thread, complete with jank galore, should not be emulated here, IMHO.

@kentonv
Copy link

kentonv commented Apr 16, 2019

@PoignardAzur Yeah honestly I was only half-seriously suggesting it. Promise pipelining could be a win when operating on a remote filesystem, but probably doesn't really benefit anything operating on local resources, which is probably the vast majority of what WASI does. So probably not worth the effort here, but fun to think about.

@titzer Fair enough. But are you saying that the specific design of having async API calls plus a separate await call to block on them is not a good compromise? Or are you only objecting to the philosophical position?

@sunfishcode sunfishcode added the discussion A discussion that doesn't yet have a specific conclusion or actionable proposal. label Apr 17, 2019
@titzer
Copy link

titzer commented Apr 17, 2019

I don't think that having a separate await call that blocks on promises is good to put in the bottom execution layer, since generally engines implement this with the moral equivalent of a CPS transform, which is probably complexity that better belongs in producers and not in engines.

@kentonv
Copy link

kentonv commented Apr 18, 2019

@titzer Sorry, I probably created some confusion by calling it "await". I wasn't suggesting that this call would be used to implement language-level async/await; instead, it would be used to implement traditional blocking I/O at the language level. The alternative seems to be to have two parallel calls for every I/O operation, one blocking and one non-blocking -- or two modes for every call, as in POSIX -- which seems comparatively ugly.

@PoignardAzur
Copy link

@kentonv Yeah, this is what I was describing as well.

Although I wonder how much a language can do with an await instruction but without an event loop. I'm guessing "not much", because in most situations, there isn't much your code can do locally while awaiting I/O data; the operations that can go on in the background are going to be on different stack frames entirely.

@sbc100
Copy link
Member

sbc100 commented May 30, 2019

I like this distinction. I think its useful regardless of the web and its main thread concepts. This approach would solve #24 and I was already about to propose the same thing in #48.

The fact the emscripten tries to merge these two concepts with its "EXIT_RUNTIME" configuration is a the source of much confusion. I don't see a compelling case for allowing the same app to be used a Command (a single main function) and then a library. These things seem fundamentally different. Am I missing something?

@dumblob
Copy link

dumblob commented Jun 1, 2019

Just two more points to keep in mind:

a) It seems to me that "reactors" are except for standalone use basic building blocks for "commands". Can "reactors" be used inside of "commands" (I think this will be inevitably needed)? What about the case with different "limitations" (as mentioned in #13 (comment) ) contradicting such use-case?

b) We shouldn't forget, that the future lies within massively parallel computation and thus languages like ParaSail which execute "each line of code" in parallel (not just concurrent!) might collide with the concept of having one main event loop (which seems to be the sole reason for the differentiation between "commands" and "reactors"). It seems to me the current proposal assumes the future lies within concurrent, but not parallel computing.

sunfishcode added a commit to WebAssembly/wasi-libc that referenced this issue Jun 5, 2019
This adds support for a new experimental "Reactor" executable model.

The "Commands" and "Reactors" concepts are introduced here:
WebAssembly/WASI#13

A companion Clang patch, which just consists of using the new
reactor-crt1.o and Reactor-specific entry point name, is here:
https://reviews.llvm.org/D62922

Instead of an entrypoint named "_start", which calls "main", which
then scopes the lifetime of the program, Reactors have a
"__wasi_unstable_reactor_start" function, which calls "reactor_setup".
When "reactor_setup" exits, the intention is that the program should
persist and be available for calling.

At present, the main anticipated use for this is in environments like
Node, where WASI-using modules can be imported and don't necessarily
want the semantics of a "main" function.

The "unstable" in "__wasi_unstable_reactor_start" reflects that this
Reactor concept is not yet stable, and likely to evolve.
@dumblob dumblob mentioned this issue Jun 14, 2019
@sunfishcode
Copy link
Member Author

As an update here, I and others have been continuing to discuss Commands and Reactors. I expect these concepts will end up being defined by interface types, as the main distinction between a Command and a Reactor is how you interface with a module from the outside.

Thinking about @PoignardAzur's observation about grep above, I think this and other use cases suggest that Reactors shouldn't be the exclusive mechanism for async I/O in wasm -- it will be useful to do async I/O in Commands too. This a further datapoint suggesting that Commands and Reactors are primarily about how you interface with a module, rather than how the module does its I/O internally.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jan 11, 2021
…ichton

Emit a reactor for cdylib target on wasi

Fixes rust-lang#79199, and relevant to rust-lang#73432

Implements wasi reactors, as described in WebAssembly/WASI#13 and [`design/application-abi.md`](https://github.com/WebAssembly/WASI/blob/master/design/application-abi.md)

Empty `lib.rs`, `lib.crate-type = ["cdylib"]`:

```shell
$ cargo +reactor build --release --target wasm32-wasi
   Compiling wasm-reactor v0.1.0 (/home/coolreader18/wasm-reactor)
    Finished release [optimized] target(s) in 0.08s
$ wasm-dis target/wasm32-wasi/release/wasm_reactor.wasm >reactor.wat
```
`reactor.wat`:
```wat
(module
 (type $none_=>_none (func))
 (type $i32_=>_none (func (param i32)))
 (type $i32_i32_=>_i32 (func (param i32 i32) (result i32)))
 (type $i32_=>_i32 (func (param i32) (result i32)))
 (type $i32_i32_i32_=>_i32 (func (param i32 i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "fd_prestat_get" (func $__wasi_fd_prestat_get (param i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "fd_prestat_dir_name" (func $__wasi_fd_prestat_dir_name (param i32 i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "proc_exit" (func $__wasi_proc_exit (param i32)))
 (import "wasi_snapshot_preview1" "environ_sizes_get" (func $__wasi_environ_sizes_get (param i32 i32) (result i32)))
 (import "wasi_snapshot_preview1" "environ_get" (func $__wasi_environ_get (param i32 i32) (result i32)))
 (memory $0 17)
 (table $0 1 1 funcref)
 (global $global$0 (mut i32) (i32.const 1048576))
 (global $global$1 i32 (i32.const 1049096))
 (global $global$2 i32 (i32.const 1049096))
 (export "memory" (memory $0))
 (export "_initialize" (func $_initialize))
 (export "__data_end" (global $global$1))
 (export "__heap_base" (global $global$2))
 (func $__wasm_call_ctors
  (call $__wasilibc_initialize_environ_eagerly)
  (call $__wasilibc_populate_preopens)
 )
 (func $_initialize
  (call $__wasm_call_ctors)
 )
 (func $malloc (param $0 i32) (result i32)
  (call $dlmalloc
   (local.get $0)
  )
 )
 ;; lots of dlmalloc, memset/memcpy, & libpreopen code
)
```

I went with repurposing cdylib because I figured that it doesn't make much sense to have a wasi shared library that can't be initialized, and even if someone was using it adding an `_initialize` export is a very small change.
@sunfishcode
Copy link
Member Author

As another update here, the basic concepts of commands and libraries (reactors) are now being defined by the component model. As presented in the WASI Preview2 presentation, commands are conceptually components that have value imports for inputs, and value exports for exports, and which run their code from the wasm start function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion A discussion that doesn't yet have a specific conclusion or actionable proposal.
Projects
None yet
Development

No branches or pull requests

8 participants