-
Notifications
You must be signed in to change notification settings - Fork 672
Tear down connections on prolonged loss of UDP heartbeat #413
Tear down connections on prolonged loss of UDP heartbeat #413
Conversation
@@ -329,6 +332,9 @@ func (conn *LocalConnection) handleReceivedHeartbeat(remoteUDPAddr *net.UDPAddr) | |||
conn.remoteUDPAddr = remoteUDPAddr | |||
conn.receivedHeartbeat = true | |||
conn.Unlock() | |||
if conn.established { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
What is the rationale for starting the heartbeat timer when becoming 'established'? The latter tells us that our heartbeats did get through, which says nothing about the other direction, which is what the timer is watching for. Wouldn't it be better to instead start the timer after sending 'established' to the other side? |
Or, why not start the timer straight away? That would simplify the code since then a) the timer would always exist, and b) we won't have to check for a condition to start it. I think that would also allow us to get rid of the 'established' timeout, i.e. we could rely on the heartbeat timeout to tear down the connection instead. |
I was thinking of the case where we are originating from behind NAT; there's no way we can receive a heartbeat from the remote peer until our outbound connection is established, so I deferred starting the timer until that had happened. I like the idea of simplifying down to a single timer very much, although it would have the effect of slightly reducing the rigour of the test (e.g. timely delivery of the established message over the TCP channel would no longer be checked) - are you happy with that? If not, I'd suggest treating the two cases as completely orthogonal and start both timers in |
"almost established", i.e. our UDP packet must have made it across, but the Established TCP message may not have turned up yet. So technically the initiating side should have a slightly longer timeout, but I can't imagine it's worth the hassle.
The ReadTimeout puts an upper bound on that. |
Conflicts: router/connection.go
Above commits address comments to date. |
Great. Does it work? How have you tested this? |
Confirmed with manual testing on two digital ocean VMs by using
to prevent weave related UDP traffic from being forwarded to the
I used a combination of Would feel a lot more comfortable with #229 in place! It could use soak/torture testing... |
Tear down connections on prolonged loss of UDP heartbeat. Closes #373.
Addresses #373 - detect udp connectivity breakage.
Once a connection moves to the established state indicating that the remote peer has received one of our heartbeats a timer is started. If we do not receive a UDP heartbeat from the remote peer within this time (default is three times the slow heartbeat interval) the connection is terminated:
At this point the existing connection resumption mechanism takes over. In normal operation, the timer is reset each time we receive a heartbeat.
Implementation notes:
establishedTimeout
andheartbeatTimeout
into a single timer; we would simply adjust duration and error messages depending onconn.established
state.