-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix net.Conn deadlines tearing down the connection when no reads or writes are active #350
Conversation
…rites are active We were unconditionally canceling the read/write contexts when the deadline timer fired, even if there were no active reads or writes. We instead only cancel the context if there is an active operation, otherwise we set a flag so that future calls (without the deadline being reset) will fail. Updates tailscale/tailscale#5921
@nhooyr we've been using your library at Tailscale with great success, but we recently ran into this problem. The way to think about it is that the following sequence of calls should work: c.SetWriteDeadline(5 * time.Second)
c.Write(x) // immediate
time.Sleep(10 * time.Second)
c.SetWriteDeadline(5 * time.Second)
c.Write(x) // immediate But the timer would fire in the middle of the Happy to make tweaks if you have other ideas for how to fix this. |
… a net.Conn We removed it in #4806 in favor of the built-in functionality from the nhooyr.io/websocket package. However, it has an issue with deadlines that has not been fixed yet (see coder/websocket#350). Temporarily go back to using a custom wrapper (using the fix from our fork) so that derpers will stop closing connections too aggressively. Updates #5921 Signed-off-by: Mihai Parparita <mihai@tailscale.com>
… a net.Conn We removed it in #4806 in favor of the built-in functionality from the nhooyr.io/websocket package. However, it has an issue with deadlines that has not been fixed yet (see coder/websocket#350). Temporarily go back to using a custom wrapper (using the fix from our fork) so that derpers will stop closing connections too aggressively. Updates #5921 Signed-off-by: Mihai Parparita <mihai@tailscale.com>
… a net.Conn We removed it in #4806 in favor of the built-in functionality from the nhooyr.io/websocket package. However, it has an issue with deadlines that has not been fixed yet (see coder/websocket#350). Temporarily go back to using a custom wrapper (using the fix from our fork) so that derpers will stop closing connections too aggressively. Updates #5921 Signed-off-by: Mihai Parparita <mihai@tailscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mihaip!
Glad to hear and thank you for sponsoring me on GitHub. Apologies for the delayed response.
I've actually already fixed this in the masterdev branch. See #228
Just haven't tagged a new release. See #256
Your patch doesn't seem correct to me as concurrent writes/reads should be allowed. I would copy the relevant parts from my linked commit instead.
There's quite a bit of debugging required to get a new version out as the tests occassionally fail on masterdev. I don't want to push a new release without being absolutely certain I didn't introduce a bug somewhere.
I hope to get to it soon. Life's been a little wild and unpredictable in both good and bad ways.
Sorry,
Anmol
@@ -151,16 +186,18 @@ func (c *netConn) SetWriteDeadline(t time.Time) error { | |||
if t.IsZero() { | |||
c.writeTimer.Stop() | |||
} else { | |||
c.writeTimer.Reset(t.Sub(time.Now())) | |||
c.writeTimer.Reset(time.Until(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, didn't realize we had time.Until
.
} | ||
c.afterReadDeadline.Store(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while highly unlikely this can race with the timer routine as it's possible the timer has already set off and the timer goroutine has already set afterReadDeadline
to true before this goroutine hits this line.
c.readMu.Lock() | ||
defer c.readMu.Unlock() | ||
if swapped := c.reading.CompareAndSwap(false, true); !swapped { | ||
panic("Concurrent reads not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrent reads should be allowed
} | ||
|
||
if swapped := c.writing.CompareAndSwap(false, true); !swapped { | ||
panic("Concurrent writes not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrent writes should be allowed
Sorry, my commit is in dev, not master. See 0a61ffe I'm not certain if it can be applied directly to master without conflict but it should. And if not, should be trivial to adapt for your needs for now. |
… a net.Conn We removed it in #4806 in favor of the built-in functionality from the nhooyr.io/websocket package. However, it has an issue with deadlines that has not been fixed yet (see coder/websocket#350). Temporarily go back to using a custom wrapper (using the fix from our fork) so that derpers will stop closing connections too aggressively. Updates #5921 Change-Id: I1597644e8ba47b413e33f2201eab935145566c0e Signed-off-by: Mihai Parparita <mihai@tailscale.com> (cherry picked from commit 9d04ffc)
… a net.Conn We removed it in #4806 in favor of the built-in functionality from the nhooyr.io/websocket package. However, it has an issue with deadlines that has not been fixed yet (see coder/websocket#350). Temporarily go back to using a custom wrapper (using the fix from our fork) so that derpers will stop closing connections too aggressively. Updates #5921 Change-Id: I1597644e8ba47b413e33f2201eab935145566c0e Signed-off-by: Mihai Parparita <mihai@tailscale.com> (cherry picked from commit 9d04ffc)
Since I've already fixed this in dev, closing. |
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2,3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2,3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2,3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2,3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2] [3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues with it [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2] [3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues with it [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2] [3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a way to serve websocket connections when Juno is started, which can be implemented in a future PR. Two design decisions: We use the nhooyr.io/websocket library. The widely used gorilla/websocket library has been archived and geth is already running into issues with it [1]. The most notable consumer of nhooyr.io/websocket that I could find is Tailscale [2] [3]. Unlike geth [4] and cometbft [5], we don't send ping messages to the client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket connection will persist. As of 2017, Chromium is doing the same [6]. Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP server will detect when the TCP connection is down, but will not detect when the application using the TCP connection is stuck. We can change this in the future, but it will slightly complicate the logic and a ping loop is difficult to test since the pong replies are handled by the websocket library. [1]: ethereum/go-ethereum#27261 [2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13 [3]: coder/websocket#350 (comment) [4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359 [5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432 [6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ [7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
We were unconditionally canceling the read/write contexts when the deadline timer fired, even if there were no active reads or writes. We instead only cancel the context if there is an active operation, otherwise we set a flag so that future calls (without the deadline being reset) will fail.
Updates tailscale/tailscale#5921