-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eventloop/networking gets stuck on musl #1431
Comments
Hello. I am using musl build on Alpine, and I have yet to notice the behavior you are describing. I will try your reproduction code and report back. Thanks for reporting. |
I have tried it with the latest Janet from the master branch on Alpine Linux with musl at the time of this writing and see no issues like the one you are describing. We will need more information about your setup. |
hm, thanks for testing. I tested on void-linux with host musl-libc (1.1.24), alpine 3.17 vm (musl 1.2.3) and debian bullseye with musl-gcc. Now I fetched alpine 3.19.2 and build janet from master and could reproduce it, too. You need to keep the "server snippet" running, start the "client snippet", kill the "client snippet" and the second invocation will get stuck. |
Oh. I did not get the process right. I am trying it again. |
Yep. I can confirm that the problem manifests in this case on Alpine. |
I'm experiencing similar behavior on Arch with glibc. This is Janet version 1.34.0-f92f3eb6, built from the repo. I don't see the issue on FreeBSD with an equivalent build (from the same commit). Here's my repro setup: (var n-req 0)
(defn handler
[stream]
(defer (net/close stream)
(ev/sleep (math/random))
(set n-req (inc n-req))
(printf "%d: %N" n-req (net/peername stream))
(net/write stream (string/format "%3d: Accepted\n" n-req))))
(defn main
[&]
(let [server (net/listen "127.0.0.1" 9000)]
(net/accept-loop server handler))) In another shell: for i in $(seq 100)
do nc -d -w 2 127.0.0.1 9000 &
done Once the accept-loop gets "stuck", the symptom I see is that some (but not all) connections are accepted and sent to the handler and some of those (but not all) result in data actually being written back to the client. The issue seems to accumulate, such that successive batches of 100 requests result in fewer and fewer successful responses. I tried Updated: I see the same behavior with the official build as well. |
So took another look at this now that I have repro that works on my system, and the root cause is in how we setup epoll/kqueue in edge trigger mode. This works well for most cases, but for The fix is conceptual fairly simple, but I would like to make server sockets use edge trigger mode unless we call net/accept-loop, in which they switch to level trigger mode and make sure we fix kqueue too, which seems to have the same wrong logic here. |
POC fix that just make all server sockets run in level trigger mode (so not a "real" solution as it would cause a busy loop in the case with diff --git a/src/core/ev.c b/src/core/ev.c
index 335b9738..e373c3f1 100644
--- a/src/core/ev.c
+++ b/src/core/ev.c
@@ -1543,8 +1543,12 @@ static JanetTimestamp ts_now(void) {
/* Wait for the next event */
static void janet_register_stream(JanetStream *stream) {
struct epoll_event ev;
- ev.events = EPOLLET;
- if (stream->flags & (JANET_STREAM_READABLE | JANET_STREAM_ACCEPTABLE)) ev.events |= EPOLLIN;
+ if (stream->flags & JANET_STREAM_ACCEPTABLE) {
+ ev.events = EPOLLIN; /* Do NOT edge trigger server sockets */
+ } else {
+ ev.events = EPOLLET;
+ }
+ if (stream->flags & JANET_STREAM_READABLE) ev.events |= EPOLLIN;
if (stream->flags & JANET_STREAM_WRITABLE) ev.events |= EPOLLOUT;
ev.data.ptr = stream;
int status;
@@ -1695,7 +1699,9 @@ static void timestamp2timespec(struct timespec *t, JanetTimestamp ts) {
void janet_register_stream(JanetStream *stream) {
struct kevent kevs[2];
int length = 0;
- if (stream->flags & (JANET_STREAM_READABLE | JANET_STREAM_ACCEPTABLE)) {
+ if (stream->flags & JANET_STREAM_ACCEPTABLE) {
+ EV_SETx(&kevs[length++], stream->handle, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, stream);
+ } else if (stream->flags & JANET_STREAM_READABLE) {
EV_SETx(&kevs[length++], stream->handle, EVFILT_READ, EV_ADD | EV_ENABLE | EV_CLEAR, 0, 0, stream);
}
if (stream->flags & JANET_STREAM_WRITABLE) { |
In the edge-trigger mode before this change, if a socket receives 2 connections before one can be handled, then only a single connection is handle and 1 connection will never be handled in some cases. Reverting to level-trigger mode makes this impossible.
This appears to be working for me. Thanks! |
musl builds are also fixed. Thanks. |
Following happens only when building janet with musl libc since 5f2e287 .
When hitting a simple spork/http/server with a load-testing tool like
wrk
and about 20 parallel connections, the server stops responding on the
second invocation of
wrk
.And this only happens when using ev/go (which http/server does), but
not with ev/call:
And a client like:
To get a musl build of janet for example on debian:
The text was updated successfully, but these errors were encountered: