Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix net.Conn deadlines tearing down the connection when no reads or writes are active #350

Closed
wants to merge 1 commit into from

Conversation

mihaip
Copy link

@mihaip mihaip commented Oct 14, 2022

We were unconditionally canceling the read/write contexts when the deadline timer fired, even if there were no active reads or writes. We instead only cancel the context if there is an active operation, otherwise we set a flag so that future calls (without the deadline being reset) will fail.

Updates tailscale/tailscale#5921

…rites are active

We were unconditionally canceling the read/write contexts when the
deadline timer fired, even if there were no active reads or writes.
We instead only cancel the context if there is an active operation,
otherwise we set a flag so that future calls (without the deadline being
reset) will fail.

Updates tailscale/tailscale#5921
@mihaip mihaip requested a review from nhooyr as a code owner October 14, 2022 21:59
@mihaip
Copy link
Author

mihaip commented Oct 14, 2022

@nhooyr we've been using your library at Tailscale with great success, but we recently ran into this problem. The way to think about it is that the following sequence of calls should work:

c.SetWriteDeadline(5 * time.Second)
c.Write(x) // immediate
time.Sleep(10 * time.Second)
c.SetWriteDeadline(5 * time.Second)
c.Write(x) // immediate

But the timer would fire in the middle of the Sleep and unconditionally cancel the context, rendering future writes impossible.

Happy to make tweaks if you have other ideas for how to fix this.

mihaip added a commit to tailscale/tailscale that referenced this pull request Oct 18, 2022
… a net.Conn

We removed it in #4806 in favor of the built-in functionality from the
nhooyr.io/websocket package. However, it has an issue with deadlines
that has not been fixed yet (see coder/websocket#350). Temporarily
go back to using a custom wrapper (using the fix from our fork) so that
derpers will stop closing connections too aggressively.

Updates #5921

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
mihaip added a commit to tailscale/tailscale that referenced this pull request Oct 18, 2022
… a net.Conn

We removed it in #4806 in favor of the built-in functionality from the
nhooyr.io/websocket package. However, it has an issue with deadlines
that has not been fixed yet (see coder/websocket#350). Temporarily
go back to using a custom wrapper (using the fix from our fork) so that
derpers will stop closing connections too aggressively.

Updates #5921

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
mihaip added a commit to tailscale/tailscale that referenced this pull request Oct 18, 2022
… a net.Conn

We removed it in #4806 in favor of the built-in functionality from the
nhooyr.io/websocket package. However, it has an issue with deadlines
that has not been fixed yet (see coder/websocket#350). Temporarily
go back to using a custom wrapper (using the fix from our fork) so that
derpers will stop closing connections too aggressively.

Updates #5921

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
Copy link
Contributor

@nhooyr nhooyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mihaip!

Glad to hear and thank you for sponsoring me on GitHub. Apologies for the delayed response.

I've actually already fixed this in the masterdev branch. See #228
Just haven't tagged a new release. See #256

Your patch doesn't seem correct to me as concurrent writes/reads should be allowed. I would copy the relevant parts from my linked commit instead.

There's quite a bit of debugging required to get a new version out as the tests occassionally fail on masterdev. I don't want to push a new release without being absolutely certain I didn't introduce a bug somewhere.

I hope to get to it soon. Life's been a little wild and unpredictable in both good and bad ways.

Sorry,
Anmol

@@ -151,16 +186,18 @@ func (c *netConn) SetWriteDeadline(t time.Time) error {
if t.IsZero() {
c.writeTimer.Stop()
} else {
c.writeTimer.Reset(t.Sub(time.Now()))
c.writeTimer.Reset(time.Until(t))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, didn't realize we had time.Until.

}
c.afterReadDeadline.Store(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while highly unlikely this can race with the timer routine as it's possible the timer has already set off and the timer goroutine has already set afterReadDeadline to true before this goroutine hits this line.

c.readMu.Lock()
defer c.readMu.Unlock()
if swapped := c.reading.CompareAndSwap(false, true); !swapped {
panic("Concurrent reads not allowed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrent reads should be allowed

}

if swapped := c.writing.CompareAndSwap(false, true); !swapped {
panic("Concurrent writes not allowed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrent writes should be allowed

@nhooyr
Copy link
Contributor

nhooyr commented Oct 20, 2022

Sorry, my commit is in dev, not master.

See 0a61ffe

I'm not certain if it can be applied directly to master without conflict but it should. And if not, should be trivial to adapt for your needs for now.

bradfitz pushed a commit to tailscale/tailscale that referenced this pull request Oct 21, 2022
… a net.Conn

We removed it in #4806 in favor of the built-in functionality from the
nhooyr.io/websocket package. However, it has an issue with deadlines
that has not been fixed yet (see coder/websocket#350). Temporarily
go back to using a custom wrapper (using the fix from our fork) so that
derpers will stop closing connections too aggressively.

Updates #5921

Change-Id: I1597644e8ba47b413e33f2201eab935145566c0e
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
(cherry picked from commit 9d04ffc)
bradfitz pushed a commit to tailscale/tailscale that referenced this pull request Oct 21, 2022
… a net.Conn

We removed it in #4806 in favor of the built-in functionality from the
nhooyr.io/websocket package. However, it has an issue with deadlines
that has not been fixed yet (see coder/websocket#350). Temporarily
go back to using a custom wrapper (using the fix from our fork) so that
derpers will stop closing connections too aggressively.

Updates #5921

Change-Id: I1597644e8ba47b413e33f2201eab935145566c0e
Signed-off-by: Mihai Parparita <mihai@tailscale.com>
(cherry picked from commit 9d04ffc)
@nhooyr
Copy link
Contributor

nhooyr commented Mar 7, 2023

Since I've already fixed this in dev, closing.

@nhooyr nhooyr closed this Mar 7, 2023
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues [1]. The most notable consumer of nhooyr.io/websocket
that I could find is Tailscale [2,3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues [1]. The most notable consumer of nhooyr.io/websocket
that I could find is Tailscale [2,3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues [1]. The most notable consumer of nhooyr.io/websocket
that I could find is Tailscale [2,3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues [1]. The most notable consumer of nhooyr.io/websocket
that I could find is Tailscale [2,3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues [1]. The most notable consumer of nhooyr.io/websocket
that I could find is Tailscale [2] [3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues with it [1]. The most notable consumer of
nhooyr.io/websocket that I could find is Tailscale [2] [3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues with it [1]. The most notable consumer of
nhooyr.io/websocket that I could find is Tailscale [2] [3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
joshklop added a commit to NethermindEth/juno that referenced this pull request Jun 21, 2023
This commit only modifies the jsonrpc package. It doesn't provide a way
to serve websocket connections when Juno is started, which can be
implemented in a future PR.

Two design decisions:

We use the nhooyr.io/websocket library. The widely used
gorilla/websocket library has been archived and geth is already
running into issues with it [1]. The most notable consumer of
nhooyr.io/websocket that I could find is Tailscale [2] [3].

Unlike geth [4] and cometbft [5], we don't send ping messages to the
client. Pro: saves bandwidth. Con: if the client app hangs, the Websocket
connection will persist. As of 2017, Chromium is doing the same [6].
Note that the TCP keep-alives sent every 15 seconds [7] by the HTTP
server will detect when the TCP connection is down, but will not detect
when the application using the TCP connection is stuck.
We can change this in the future, but it will slightly complicate the
logic and a ping loop is difficult to test since the pong replies are
handled by the websocket library.

[1]: ethereum/go-ethereum#27261
[2]: https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13
[3]: coder/websocket#350 (comment)
[4]: https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359
[5]: https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432
[6]: https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ
[7]: https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants