-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http/3 seems to stop working after any kind of reload #4348
Comments
I think this code is supposed to handle that: caddy/modules/caddyhttp/app.go Lines 427 to 452 in a779e1b
We'll have to wait until @mholt has time to take a look at this one. |
Hm, I'm able to reproduce it, but this used to work and I know we haven't changed our listener code recently. Might be worth checking if something upstream changed (inadvertently)? @marten-seemann might know. |
Hi, I'm also having a similar issue Caddy simply stops serving HTTP/3 requests after a while. I have not seen any logged errors using
I'd be happy to help debug this as my goal with Caddy is specifically testing HTTP/3. I would need some pointers on where to look. For example tcpdump, wireshark or some debugger? |
We didn't change anything about the shutdown logic recently, as far as I'm aware. Is there any way to easily reproduce this? |
At least the curl issue is reproducible 100%. The other part is that Chrome and Firefox suddenly switch over to HTTP/2 with no regard of the |
I think this is odd behaviour: While https://www.http3check.net/?host=http3.is gives the expected results I've added a QLog output. |
@Forza-tng Please open another issue for this as this is very likely to be unrelated and it is very important to keep this issue on topic to be able to solve it effectively. :) |
You're right. I opened a new issue. At the moment i just had the same issue that Caddy simply stops responding anything over http3. Restarting Caddy solved it. Unfortunately I couldn't reproduce the issue. @zierhut-it Out of curiosity, have you checked the issue I linked above? Do you get that same weird response from https://www.http3check.net/ |
Reproducing reliably on 2.4.5 - keeps working after first systemctl reload, but stops working after a second. I believe the confusion on http3checker.net is that they refer to "h3" / "h3-29" as HTTP/3, and "h3-27" (etc?) as QUIC |
have same issue with http3 stops working, im not rely on checkers, im checking in network console of FF. did some local fix, if it will work couple of days in production fine i will post it here. |
in brief: After i disabled it in app.go http3 still work for me even after each 5 mins reload in production with real traffic users. |
@vladbondarenko feel free to make a pull request with the change you propose! |
PR #4413 |
Not sure, but I might've figured out what's going on here. Previously (before quic-go/quic-go#2111) the issue occurred due to multiple quic-go listeners (https://github.com/lucas-clemente/quic-go/blob/fa070e585ecb4ef300795f4bf99cb7815218cc1e/packet_handler_map.go) being started by the multiplexer (https://github.com/lucas-clemente/quic-go/blob/fa2e7972156393e505a16f10ac833369d9614c08/multiplexer.go), which results, if I understood correctly, in a race condition where the yet-to-be-killed server eats up packets meant for the newly launched server, which is why some connections fail after the reload. Looking at what we have now, it seems like the issue occurs due to the aforementioned quic-go listeners being closed after the new caddy server starts:
Since during 2.2 the packetHandlerMap is closed, it stops listening on caddy's fakeClosePacketConn and thus stops receiving packets for the actual connection. However, the new caddy server setup during 2.1 still uses this packetHandlerMap without knowing that it has stopped listening for new requests, which is why it simply doesn't receive anything. Someone pls double check this and if you can, explain how the previous fix (not closing http3 servers) was valid and how it helped with this problem I haven't had time to actually test this with logs of some kind, so that might also be something that should be done |
Tested this out - what I've written about is indeed part of the problem. However there is another issue - during step 2.1 when the new quic-go server is created using the previous packetHandlerMap, it calls SetServer, which sets it to packetHandlerMap's underlying server. During step 2.2 when CloseServer is called, packetHandlerMap.server is set to nil, which is why no new packets are routed to the server. |
Yup, after reload I'm getting "received a packet with an unexpected connection ID" (https://github.com/lucas-clemente/quic-go/blob/fa070e585ecb4ef300795f4bf99cb7815218cc1e/packet_handler_map.go#L381). I've fixed the first issue by making fakeClosePacketConn public and using go h3srv.Serve(h3ln.(*caddy.FakeClosePacketConn).PacketConn) at caddy/modules/caddyhttp/app.go Line 367 in a779e1b
However I do not know of a decent way to fix the second issue I've described, since it requires either changing Caddy's reload order (seems like a no-no) or changing the way quic-go starts servers (maybe make packetHandlerMap support multiple servers?). Waiting for comments from you guys now |
👋, quic-go author here. Sorry I didn't follow the discussion here in detail so far. In general, I'm not opposed to changing the quic-go API, if that enables a use-case we had not though of before and that's not possible with what we have right now, but I'd like to understand better what Caddy is actually is trying to achieve. Can we take a step back and answer the following questions about what is supposed to happen during a reload?
|
Not giving an authoritative answer, however judging by how Caddy works with http1.1 and h2, as well as quic-go/quic-go#2103:
|
I think the main difference between http1-2/http3 encountered here is that http1-2 are based on tcp connections, which can be reused by concurrently running servers (as they are by Caddy using a "fake" connection wrapper which doesn't actually close the underlying connection), however for http3 atop quic-go that would be quic-go servers, since the actual listeners aren't exported by the multiplexer (which makes sense since they are not on the session level). Quic-go servers can be reused like tcp connections can in the sense that multiple servers can concurrently Accept new sessions from them, which is why it makes sense to reuse them for multiple http3 servers on the same address (like in Caddy). This seems to be the exact problem we're looking at right now - Caddy launches multiple http3 servers which all try to launch their own quic-go servers, even though they only need one. So allowing the creation of an http3 server with a given quic-go Listener might be a decent change. (By the way, currently quic-go's Listener's doc doesn't state that it is thread-safe, even though it seems to be) |
Not sure if that's a good idea. If you call This could relatively easily be achieved by adding a I'm pretty busy at the moment, so I probably won't get around to implementing this any time soon. Would you be interested in adding this feature to quic-go? |
Yes, calling Accept from multiple servers is not a good idea in that sense, what I meant is that logically, two http servers can operate on the same tcp connection in the same way they can operate on the same quic server (even if that means non-deterministic connections). However in the case of Caddy what we need is exactly how you phrased it and it matches what I meant - allowing creation of I'm up for implementing this in quic-go and updating Caddy accordingly, though I'd like to wait for someone from the Caddy team to approve this as well |
To my understanding, #4348 (comment) is correct, except that GraceDuration can be configured to set a timeout at which point the old connections will be force closed so that the old config can finish shutting down. Right now we have the problem that certain kinds of connections like Websockets or long-polling SSE never gracefully close so they need to be force closed. So yeah, Caddy having control over the listeners sounds like a good plan, so they can be kept around while a config reload happens. Basically, a ref counter is used to determine if the listener should be closed. When Caddy first starts, the counter goes from 0 to 1. When a reload happens, the new config gets loaded in first, so the count goes from 1 to 2, then the old server shuts down and the count goes back from 2 to 1. When Caddy is being shut down completely, the count goes from 1 to 0 and the listeners finally get cleaned up. If a reload happens but the listener is different (imagine a different port number is used, no overlap) then the old listener would also get cleaned up even if Caddy is not shutting down, because the old one's ref count doesn't go up due to reuse. |
Just catching up on this... lots of progress made in the middle of the night here! Thank you for the in-depth exploration, @renbou -- this is super helpful. Based on my reading of Artem's description of the HTTP/1 and HTTP/2 servers over TCP, I believe that understanding is correct. Upon a reload, existing servers immediately go into "shutdown" mode which stops accepting new connections, waits for existing connections to become idle, then closes the socket; while the new server immediately starts accepting new connections. A GracePeriod can be configured to forcefully close old servers if the connections do not become idle soon enough (as @francislavoie explained). @marten-seemann Nice to hear from you too! Thanks for chiming in while you're so busy.
Yep, if that's how to do it correctly, let's try it.
Thank you 🙏 I'm a bit swamped preparing the v2.5 release, so having a contribution here would be very helpful. If all goes well maybe it could make it into v2.5.0. If not, then maybe v.2.5.1; no biggie either way. Let me know how I can help going forward! I'm glad there's some traction here, it'll be nice to see HTTP/3 work through reloads. |
I've been figuring out what exactly needs to be implemented in quic-go and Caddy and it seems like quic-go only needs a As for Caddy, the changes seem even easier, however I'd like to clarify them first. Since Caddy currently uses caddy/modules/caddyhttp/app.go Line 356 in 4e9fbee
fakeClosePacketConn.SetReadBuffer and fakeClosePacketConn.SyscallConn which exist for the sole purpose of exporting those methods from the underlying UDP connection to quic-go.
@mholt how does this sound? |
Sounds good to me. Maybe |
It needs to return |
@renbou That makes a lot of sense to me. I opened quic-go/quic-go#3347 in quic-go. Feel free to submit a PR to resolve this issue. :) |
Still need some time to catch up on this (y'all like to work while I'm sleeping, ha). 😁 Thank you for working on it! Anyway: @renbou |
@renbou Otherwise the changes sound like a good start -- want to draft up a PR for review? 😃 |
I've started a PR with the required changes to quic-go at quic-go/quic-go#3349 and tested them with Caddy locally - everything works, however since it was all done locally, the go.mod points to my local version. Should I create a PR with the WIP changes then? Supposedly only the go.mod will need to be updated after we've settled on the new quic-go API. |
quic-go/quic-go#3349 has been merged 🎉 |
v2.6.2 h1:wKoFIxpmOJLGl3QXoo6PNbYvGW4xLEgo32GPBEjWL8o=
|
http/3 seems to stop working after any kind of reload.
http/3 works as expected before reloading
I went as far as v2.0.0 back (thus the two Caddyfiles, see further down) and encountered this issue every time
Caddy seems to still listen on
udp/443
when it's happening:Verbose
curl -vvv
logs:Verbose caddy logs from console with annotations (// 📌):
Caddyfile (v2.3.0+):
Caddyfile (v2.3.0+) as JSON:
Caddyfile (pre v2.3.0):
Below are 3 different way to reproduce this issue:
Case (collapse)
$ curl --http3 https://localhost # localhost
$ curl --http3 https://localhost # curl: (28) Failed to connect to localhost port 443: Connection timed out
Case (expand)
$ curl --http3 https://localhost # localhost
$ curl --http3 https://localhost # curl: (28) Failed to connect to localhost port 443: Connection timed out
Case (expand)
$ curl --http3 https://localhost # localhost
respond
) and save$ curl --http3 https://localhost # curl: (28) Failed to connect to localhost port 443: Connection timed out
Hope this is has not yet been reported. Otherwise feel free to close this issue.
And despite this bug, Caddy is awesome! :)
~ @IndeedNotJames
The text was updated successfully, but these errors were encountered: