You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
Connection.run() runs LocalPeer.AddConnection() in a separate goroutine. There's a comment explaining why that is both necessary and safe. The latter turns out to be wrong. If LocalConnection.actorLoop() terminates early on, the LocalConnection.RemoveConnection() call in the deferred LocalConection.shutdown() may overtake the AddConnection().
Note that this actually cannot happen that easily. Ordinarily LocalConnection.actorLoop() will only terminate when processing a LocalConnection.Shutdown(). And that simply cannot get invoked until after LocalPeer.AddConnection() has been called since only then it is possible to grab hold of a reference to this LocalConnection.
But there is one other termination case: failure to receive a UDP heartbeat for HeartbeatTimeout(30) seconds. This does require the LocalPeer.AddConnection() call to be delayed by at least the same amount of time, which should only happen in unusual circumstances, e.g. massive load, paging, weird VM pausing/stalling behaviour, etc. "Massive load" is how I ran into this, namely running 50 weaves on my machine.
I subsequently managed to reproduce the issue with just two weaves using
diff --git a/router/connection.go b/router/connection.go
index 9c47834..d9fc2cb 100644
--- a/router/connection.go+++ b/router/connection.go@@ -304,6 +304,7 @@ func (conn *LocalConnection) run(actionChan <-chan Connectio
// main connection loop, and c) it guards against potential
// deadlocks.
go func() {
+ time.Sleep(HeartbeatTimeout + 1*time.Second)
conn.Router.Ourself.AddConnection(conn)
conn.receiveTCP(dec)
}()
The logs on the two weaves show
weave 2015/04/08 22:22:36.562526 Our name is 7a:fb:3b:8e:25:78(weave2)
weave 2015/04/08 22:22:37.507724 ->[172.17.4.147:6783] attempting connection
weave 2015/04/08 22:22:37.508453 ->[172.17.4.147:6783] completed handshake with 7a:a5:14:62:2e:d5(weave1)
weave 2015/04/08 22:23:07.508836 ->[7a:a5:14:62:2e:d5(weave1)]: connection shutting down due to error: timed out waiting for UDP heartbeat
weave 2015/04/08 22:23:08.508874 ->[7a:a5:14:62:2e:d5(weave1)]: connection added (new peer)
weave 2015/04/08 22:22:35.205254 Our name is 7a:a5:14:62:2e:d5(weave1)
weave 2015/04/08 22:22:37.507948 ->[172.17.4.148:48097] connection accepted
weave 2015/04/08 22:22:37.508387 ->[172.17.4.148:48097] completed handshake with 7a:fb:3b:8e:25:78(weave2)
weave 2015/04/08 22:23:07.508680 ->[7a:fb:3b:8e:25:78(weave2)]: connection shutting down due to error: timed out waiting for UDP heartbeat
weave 2015/04/08 22:23:08.508758 ->[7a:fb:3b:8e:25:78(weave2)]: connection added (new peer)
and the status shows
Our name is 7a:fb:3b:8e:25:78(weave2)
...
Peers:
Peer 7a:fb:3b:8e:25:78(weave2) (v1) (UID 7828146128608141902)
-> 7a:a5:14:62:2e:d5(weave1) [172.17.4.147:6783 (unestablished)]
Peer 7a:a5:14:62:2e:d5(weave1) (v0) (UID 8931467520931168315)
Our name is 7a:a5:14:62:2e:d5(weave1)
...
Peers:
Peer 7a:a5:14:62:2e:d5(weave1) (v1) (UID 8931467520931168315)
-> 7a:fb:3b:8e:25:78(weave2) [172.17.4.148:48097 (unestablished)]
Peer 7a:fb:3b:8e:25:78(weave2) (v0) (UID 7828146128608141902)
The text was updated successfully, but these errors were encountered:
Connection.run()
runsLocalPeer.AddConnection()
in a separate goroutine. There's a comment explaining why that is both necessary and safe. The latter turns out to be wrong. IfLocalConnection.actorLoop()
terminates early on, theLocalConnection.RemoveConnection()
call in the deferredLocalConection.shutdown()
may overtake theAddConnection()
.Note that this actually cannot happen that easily. Ordinarily
LocalConnection.actorLoop()
will only terminate when processing aLocalConnection.Shutdown()
. And that simply cannot get invoked until afterLocalPeer.AddConnection()
has been called since only then it is possible to grab hold of a reference to thisLocalConnection
.But there is one other termination case: failure to receive a UDP heartbeat for
HeartbeatTimeout
(30) seconds. This does require theLocalPeer.AddConnection()
call to be delayed by at least the same amount of time, which should only happen in unusual circumstances, e.g. massive load, paging, weird VM pausing/stalling behaviour, etc. "Massive load" is how I ran into this, namely running 50 weaves on my machine.I subsequently managed to reproduce the issue with just two weaves using
The logs on the two weaves show
and the status shows
The text was updated successfully, but these errors were encountered: