Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles #542

Open
pavelsavara opened this issue Sep 25, 2024 · 11 comments

Comments

@pavelsavara
Copy link

pavelsavara commented Sep 25, 2024

In context of dotnet sockets I would like to be able to wasi:poll_poll on mix of handles

  • which were created by consuming other WASI APIs, like wasi:http
  • and file handles obtained by using libc sockets APIs

The motivation to do it on single "system" call is to

  • be able to make progress on events from HTTP while socket is not ready and vice versa
    • if we called each select/poll or poll_poll separately, we would get blocked only on part of pollables
  • to be able to block (stop spinning) if there are no I/O events (and no CPU bound jobs)

At the moment, we can hack it via using libc internal descriptor_table_get_ref and it's data structures to obtain underlying pollables.
@dicej made similar hack for mio
But it's fragile and bad practice.

I can see few possible solutions

  • expose libc function accepting both lists of resources
    • void wasi_poll_plus_fd(poll_list_borrow_pollable_t *in, list_pollfd_t *in, wasip2_list_u32_t *ret)
    • this is with wit-bindgen generated C structures on APIs and with list of pollfd struct
  • we can try simpler C types, something like
    • void wasi_poll_plus_fd(int *pollables, int pollables_count, struct pollfd *fds, int fds_count, int **ret, int *ret_count)
  • or we can expose just the mapping from fds to list of pollables
    • void socket_fds_to_wasi_pollables(struct pollfd *fds, int fds_count, int **pollables, int pollables_count)
    • this would have to be called before each call to wasi:poll_poll
  • create libc file descriptor by registering external pollable
    • this would have to be un-registered later
    • calling socket oriented poll() API on any pollable (clock) feels confusing to me

Relevant code, possibly to be refactored and reused for above.

switch (entry->tag) {
case DESCRIPTOR_TABLE_ENTRY_TCP_SOCKET: {
tcp_socket_t *socket = &(entry->tcp_socket);
switch (socket->state.tag) {
case TCP_SOCKET_STATE_CONNECTING:
case TCP_SOCKET_STATE_LISTENING: {
if ((pollfd->events &
(POLLRDNORM | POLLWRNORM)) != 0) {
states[state_index++] = (state_t){
.pollable =
socket->socket_pollable,
.pollfd = pollfd,
.entry = entry,
.events = pollfd->events
};
}
break;
}
case TCP_SOCKET_STATE_CONNECTED: {
if ((pollfd->events & POLLRDNORM) !=
0) {
states[state_index++] = (state_t){
.pollable =
socket->state
.connected
.input_pollable,
.pollfd = pollfd,
.entry = entry,
.events = POLLRDNORM
};
}
if ((pollfd->events & POLLWRNORM) !=
0) {
states[state_index++] = (state_t){
.pollable =
socket->state
.connected
.output_pollable,
.pollfd = pollfd,
.entry = entry,
.events = POLLWRNORM
};
}
break;
}
case TCP_SOCKET_STATE_CONNECT_FAILED: {
if (pollfd->revents == 0) {
++event_count;
}
pollfd->revents |= pollfd->events;
break;
}
default:
errno = ENOTSUP;
return -1;
}
break;
}
case DESCRIPTOR_TABLE_ENTRY_UDP_SOCKET: {
udp_socket_t *socket = &(entry->udp_socket);
switch (socket->state.tag) {
case UDP_SOCKET_STATE_UNBOUND:
case UDP_SOCKET_STATE_BOUND_NOSTREAMS: {
if (pollfd->revents == 0) {
++event_count;
}
pollfd->revents |= pollfd->events;
break;
}
case UDP_SOCKET_STATE_BOUND_STREAMING:
case UDP_SOCKET_STATE_CONNECTED: {
udp_socket_streams_t *streams;
if (socket->state.tag ==
UDP_SOCKET_STATE_BOUND_STREAMING) {
streams = &(
socket->state
.bound_streaming
.streams);
} else {
streams = &(
socket->state.connected
.streams);
}
if ((pollfd->events & POLLRDNORM) !=
0) {
states[state_index++] = (state_t){
.pollable =
streams->incoming_pollable,
.pollfd = pollfd,
.entry = entry,
.events = POLLRDNORM
};
}
if ((pollfd->events & POLLWRNORM) !=
0) {
states[state_index++] = (state_t){
.pollable =
streams->outgoing_pollable,
.pollfd = pollfd,
.entry = entry,
.events = POLLWRNORM
};
}
break;
}
default:
errno = ENOTSUP;
return -1;
}
break;
}
default:
errno = ENOTSUP;
return -1;
}
} else {
abort();
}

cc @badeend

@badeend
Copy link
Contributor

badeend commented Sep 25, 2024

Out of the four options, 1 and 2 seem the best to me.

or we can expose just the mapping from fds to list of pollables

Note that one libc socket can map to anywhere from 0 to 3 pollable resources depending on its state. With the proposed function signature the user can't know which pollable relates to which aspect of the socket.

calling socket oriented poll() API on any pollable (clock) feels confusing to me

POSIX poll is for any async I/O and not specific to sockets, so there wouldn't be any confusion for me. Regardless, I'd still go for option 1 or 2 as those accomplish the desired goal in a single call.

@badeend
Copy link
Contributor

badeend commented Oct 2, 2024

After discussing with @sunfishcode we came up with another alternative solution:
Rather than attempting to integrate external resources into the libc/POSIX polling mechanism, we could expose the libc-managed WASI resources for external use:

int32_t get_wasip2_input_stream(int fd); // wasi:io/streams@0.2.0.input-stream
int32_t get_wasip2_output_stream(int fd); // wasi:io/streams@0.2.0.output-stream
int32_t get_wasip2_filesystem_descriptor(int fd); // wasi:filesystem/types@0.2.0.descriptor
int32_t get_wasip2_tcp_socket(int fd); // wasi:sockets/tcp@0.2.0.tcp-socket
int32_t get_wasip2_udp_socket(int fd); // wasi:sockets/udp@0.2.0.udp-socket
// etc..

The consumer is in charge of getting the pollables from them.

These come with some implicit assumptions that should be documented:

  • These should be considered specific to Preview2. They will not work on preview1 and may not work on preview3, depending on how the future pans out.
  • Any one of them may fail (with a negative value?). E.g.
    • Calling get_wasip2_filesystem_descriptor on stdout will always fail.
    • Calling get_wasip2_output_stream on a socket may fail if it isn't connected yet.
  • The resources remain "owned" by wasi-libc:
    • The consumer should not drop these resources.
    • Calling POSIX close invalidates any of these resources.
    • Child resources (e.g. Pollables) derived from these resources must be dropped before calling POSIX close. Failing to do so may trap.

@pavelsavara
Copy link
Author

pavelsavara commented Oct 2, 2024

If WASPp3 will call back on promises instead of central polling. And if that lands in few months, I'm happy to wait for that and drop this issue.

I'm not 100% clear about WASIp3 callbacks from streams. I'm worried about performance.
(I admit I didn't study WIT draft yet)

  • Maybe WASI callback would be cheaper than unix syscall ? (I understand that epoll is optimization of that)
  • Are we going to marshal and allocate new Promise for each event ?
  • Would there be any throttling (Nagle's algorithm) ? Or it could happen that stream sender (component) would trigger callback in receiver (component) for each byte if they push byte by byte ?

@pavelsavara
Copy link
Author

Also the way how we hide those promises/streams and callback behind wasi-libc matters.

@dicej
Copy link
Contributor

dicej commented Oct 2, 2024

Maybe WASI callback would be cheaper than unix syscall ? (I understand that epoll is optimization of that)

If the read or write call can complete immediately (because there are bytes available or the host is ready to receive bytes, respectively), then there's no need for the host to allocate a task or call a callback. So we only pay the cost of allocating a task and calling the callback when we're I/O limited anyway. Also, e.g. Wasmtime is optimized to make these guest<->host calls as cheap as possible.

As usual, we'll want to benchmark and optimize as necessary if we do discover bottlenecks.

Are we going to marshal and allocate new Promise for each event ?

We will create a new task each time an async host function is unable to complete immediately, yes.

Would there be any throttling (Nagle's algorithm) ? Or it could happen that stream sender (component) would trigger callback in receiver (component) for each byte if they push byte by byte ?

I guess that depends on whether the host implementation of TCP streams buffers output (either in user space or at the kernel level).

@badeend
Copy link
Contributor

badeend commented Oct 2, 2024

We will create a new task each time an async host function is unable to complete immediately, yes.

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

@dicej
Copy link
Contributor

dicej commented Oct 2, 2024

We will create a new task each time an async host function is unable to complete immediately, yes.

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

Luke has not yet written down a spec for streams AFAIK, so I made one up for my prototype implementation, and that creates a new Task as necessary for each read or write. I'm trying to remember what Luke was planning for the final spec; perhaps it just uses the stream ID for callback notifications.

@pavelsavara
Copy link
Author

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

dotnet Task is single resolve only.

@dicej
Copy link
Contributor

dicej commented Oct 2, 2024

dotnet Task is single resolve only.

Sure, but p3 tasks might not map one-to-one with .NET Tasks

@badeend
Copy link
Contributor

badeend commented Oct 2, 2024

@dicej Okay! It doesn't really matter for this issue. Was just curious.

@lukewagner
Copy link
Member

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

Right, the current proposal (which I'm trying to finish up and post) has:

stream.read $t: [readable-stream-index:i32 ptr:i32 n:i32] -> [result:i32]

where the result bits may indicate that stream.read blocked and that the caller needs to task.wait for the read to complete, where task.wait will return a "stream read" event-code along with the readable-stream-index and how much (<= n) was written into ptr. And after that, another stream.read can be started with the same readable-stream-index, and so on. However, bindgen should be quite able to bind each stream.read invocation to a single-shot task/future/promise that is resolved when the corresponding "stream read" event is returned by task.wait.

dicej added a commit to dicej/mio that referenced this issue Oct 9, 2024
This implementation currently uses a mix of POSIX-style APIs (provided by
`wasi-libc` via the `libc` crate) and WASIp2-native APIs (provided by the `wasi`
crate).

Alternatively, we could implement `Selector` using only POSIX APIs,
e.g. `poll(2)`.  However, that would add an extra layer of abstraction to
support and debug, as well as make it impossible to support polling
`wasi:io/poll/pollable` objects which cannot be represented as POSIX file
descriptors (e.g. timer events, DNS queries, HTTP requests, etc.).

Another approach would be to use _only_ the WASIp2 APIs and bypass `wasi-libc`
entirely.  However, that would break interoperability with both Rust `std` and
e.g. C libraries which expect to work with file descriptors.

Since `wasi-libc` does not yet provide a public API for converting between file
descriptors and WASIp2 resource handles, we currently use a non-public API (see
the `netc` module below) to do so.  Once
WebAssembly/wasi-libc#542 is addressed, we'll be able
to switch to a public API.

I've tested this end-to-end using https://github.com/dicej/wasi-sockets-tests,
which includes smoke tests for `mio`, `tokio`, `tokio-postgres`, etc.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants