-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to close writer: context canceled #474
Comments
Hey @lukaspj, thanks for the user experience report. You are definitely not the first! This is a very common mistake to make. There is one valid use-case for I see this issue is further exacerbated by the readme and examples using it incorrectly (https://github.com/coder/websocket?tab=readme-ov-file#server). We should at least update the readmes and examples, but I'll try to think about some alternative solutions as well. Perhaps the most obvious way would be to put a context on Anyway, if you have anything else to add, how you wish this had worked instead, what lead you down the wrong path, etc, please feel free to share. 1 A valid use-case is when you're running the websocket logic in a separate goroutine, and want it to stop when the HTTP handler exits. Usually the logic runs in the HTTP handler, though, where using |
I think you pin-pointed it, if the readme and examples explained it, it would probably help. Additionally, it would be nice if the function documentation would state this danger of closing the context. Something like "if the context is cancelled, it can cause a deadlock" or just "the context should be the context of the websocket not the context of the message being written". Idk if there's a nice way to write it tbh, but would be nice if it showed up on the documentation somewhere on "how to use this library". |
@mafredri I don't think a docs update alone will resolve this issue. @lukaspj seems to be saying that if the context you pass to |
@spikecurtis I was thinking about the original report, but that's a good callout. We should split the second issue into a new ticket. There may be multiple issues in how connection closure is propagated and context cancellation is handled. There seems to be multiple code-paths depending on scenario. I also noticed while looking at #476 that a read/write context that is cancelled actually ends up closing the connection abruptly, whereas I would expect we just give up on the write. |
The HiJacker docs referenced note:
This sounds to me like the context is still useful after a hijack, I'm not following why "in general it's almost never what you want"? Wanting to stop when the HTTP handler exits is probably a very common use case? And not too different to having a context created in the handler function. In the PR that has been merged (https://github.com/coder/websocket/pull/477/files), it says:
It doesn't go on to explain why r.Context() is not recommended, it doesn't explain what "surprising behaviour" is, and the http.Hijacker seems to suggest that r.Context() is valid after a Hijack. Could you clarify what is meant here? If a request.Context or any other context is used in Write, it shouldn't lead to a broken websocket connection. If Write only accepts context.Background, is there any value in passing a context at all? |
@maggie44 certainly. In my experience, a common use-case is that the websocket is managed within the In the aforementioned scenario, the That's not to say there aren't use-cases, one could imagine the following method on a service: func (s *Server) handleWS(w http.ResponseWriter, r *http.Request) {
conn, _ := websocket.Accept(s.ctx, r, nil) // <- Note, r.Context() would be wrong here because it will **never** cancel.
s.connHandler <- conn
<-s.ctx.Done()
} I'd be interested to know if there's a relatively common scenario in which you would find the behavior useful? |
I think the behaviour described in the first post is what I would have expected:
A Where it is problematic is in this scenario it doesn't release the lock, and it seems the Write hangs, but up until that stage it seems that r.Context() is doing what I would expect. It sounds like r.Context() in the Hijack scenario is equivalent to:
As you mentioned, this isn't what is expected in terms of the user disconnecting doesn't call a cancel, but is a useful way to attach multiple functions, process, particularly goroutines to the broader user connection. One of those things is communicating to writers and readers that nobody is here anymore to hear the requests? An example might be using the Ping() functionality in this package. I might start a goroutine that loops every 10 seconds, to call a ping for keep alive. I would pass r.Context() to this so that it will keep pinging until the connection is closed. I think long story short, the |
There's nothing inherently wrong with passing the
I will reluctantly say maybe, but all the request context says is that the HTTP handler returned and nothing else.
This is incorrect and precisely the misconception the documentation changes is trying to address. The request context has no association to the connection after Hijack. Consider the following modified (from example) echo server handler: func (s echoServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
c, err := websocket.Accept(w, r, &websocket.AcceptOptions{})
if err != nil {
s.logf("%v", err)
return
}
go func() { // Observe, we return immediately after Accept from ServeHTTP, r.Context() is cancelled.
defer c.CloseNow()
l := rate.NewLimiter(rate.Every(time.Millisecond*100), 10)
for {
err = echo(c, l) // Using r.Context() here would break this implementation.
if websocket.CloseStatus(err) == websocket.StatusNormalClosure {
return
}
if err != nil {
s.logf("failed to echo with %v: %v", r.RemoteAddr, err)
return
}
}
}()
} A completely valid use-case where we don't and shouldn't care about |
Interesting conversation, and I think this thread is turning in to exactly what is needed, a place for people to read through that flushes out the different scenarios.
The request context ( Perhaps I have not been using ServeHTTP as others have, but I have something in ServeHTTP blocking for the life of the websocket connection to mimic HTTP behaviour. I presume others do too, as |
Glad to hear it 👍🏻
Depends on the author, I could make up use-cases but I don't think that'll add anything to the conversation. Rather I'd like to see it proved useful enough to warrant changing the recommendation from against using it. (Am I right to understand that's what's being discussed here?) My take is that it's useful maybe 1/3 times (but rare), in 1/3 times it's harmful and in 1/3 times it'll prevent you from progressing until you figure it out (or cause unexpected behaviors depending on when the handler exits). So if someone wants to use it, they should really know what they're doing and preferably have a deep understanding of the behavior. Ultimately the recommendation is just that, a recommendation. Users are free to ignore it. |
1 part thinking about what the recommendation should be, 2 parts checking over my code to make sure it has been doing what I thought it had been doing all along 😆 .
So
If I think I'm happy with my implementation now at least, and learnt a few things about the Hijack behaviour. I guess we will see how people decide to implement. 👍 |
Haha, alright. 😄
Trying to read or write to the connection will tell you if the connection has closed irrespective of the request context. The bigger question is, where would you even find use for the request context as a signal the connection may have been closed? That question is rhetorical though, I don't expect an answer 😄
I wouldn't say those two are necessarily related, i.e. defer doesn't prove the validity of the request context. For instance, the code you shared is a traditional use-case, perhaps the most common one. All logic is in the handler and executes synchronously. From this point of view, the context will never cancel. Because by the time it does it's irrelevant and we've already stopped all work and the connection is closed.
Great to hear, and good luck with evolving your implementation! 👍🏻 |
I'm using this library to create a zero-trust tunnel into a Kubernetes cluster, when I encountered this error message:
So what happened was that when client closed the connection, I would send a "connection closed" message over the WebSocket connection, but I (mistakenly) used the request context for the write command.
This context would get cancelled somewhere around this Write call, sometimes after message was written but before it released the lock.
As a result, the lock would never get released and the websocket was no longer usable.
The solution was to create a new context for the write:
And I don't think I should have used the request context to begin with, but I do think it's not optimal behaviour that the websocket can be left in a broken state by mistimed context cancel.
In any case, just thought I'd share it. Thanks for maintaining this awesome library!
The text was updated successfully, but these errors were encountered: