-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swarm/network: hive bug: needed shallow peers are not sent to nodes beyond connection's proximity order #19326
Conversation
c5c5c0d
to
435c5f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the PR lacks in-progress
or ready for review
labels, but a review has been requested from me I decided to give one. Please, forgive me if I came too early.
// the first time this is received we send peer info on all | ||
// our connected peers that fall within peers saturation depth | ||
// otherwise this depth is just recorded on the peer, so that | ||
// subsequent new connections are sent iff they fall within the radius |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 'iff' => 'if'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iff means if and only if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: then I would prefer 'if and only if' because that requires no knowledge of the abbreviation; that has to be written just once and read many times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should have in its description why it makes #19274 obsolete
} | ||
|
||
// pivotDepth is the advertised depth of the pivot node we expect in the outgoing subPeersMsg | ||
pivotDepth := hive.saturation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a compelling explanation why this is the correct function to use for this test and not hive.depth
. And why is hive.depth
accessible at all? Why was NeighbourhoodDepth()
wrong and/or producing flaky results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logic. Polemic on the ambiguous naming aside.
saturation is the depth that the peer is advertising, basically, the shallowest bin, that has less than MinBinSize
connections
swarm/network/discovery_test.go
Outdated
|
||
// now we need to wait until the tester's control peer appears in the hive | ||
// so the protocol started | ||
ticker := time.NewTicker(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deem the solution in the other PR with a channel to be more elegant than this polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think your channel is closed before the protocol run function is called so it is too early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't follow your reasoning here:
protocol := func(p *p2p.Peer, rw p2p.MsgReadWriter) error {
...
bzzAddr := preConns[n]
// pop the address from the "stack"
preConns = preConns[:n]
// if there are no more addresses left, inform the caller
if len(preConns) == 0 {
waitC <- struct{}{}
}
The channel is "closed" inside the protocol
function?
defer mu.Unlock() | ||
nodeToAddr[p.ID()] = addrs[0] | ||
bzzAddr := &BzzAddr{addrs[0], []byte(p.Node().String())} | ||
addrs = addrs[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying my suggestion, If you pass a chan
to newBzzBaseTesterWithAddrs
, then you could just close that channel when all addresses have been assigned instead of polling on the caller.
Also. len(addrs)
will always be one, so maybe we don't need to pass a [][]byte
, but I guess you want to keep it generic?
- hive bug: needed shallow peers were not sent to nodes beyond connection's proximity order - add extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange - modify bzzProtocolTester to allow pregenerated overlay addresses
422f3ca
to
a89902e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just one major concern: the potential resource leak. However, if I'm right on that I think all the calls for NewProtocolTester()
share the problem, we might defer that fix and create an issue.
All other comments are minor with the addition that, please, drop the unused *testing.T
param.
Don't block on me.
|
||
// for values MaxPeerPO < peerPO < MaxPO the pivot has no peers to offer to the control peer | ||
// in this case, no peersMsg will be sent out, and we would run into a time out | ||
if len(expBzzAddrs) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I still hold the intuition that maybe this test tries to test too many things at once. (clean code: test one concept per test, ideally with one assert) But I let that go. On the other hand, please consider my version for the same logic. What do you think? Is it simpler?
- I put the simplest case first, so I can "forget" about that quickly. (smaller working memory)
- I did not like the return statement was 3 indents deep.
if len(expBzzAddrs) > 0 {
if err != nil {
t.Fatal(err)
}
} else {
if err == nil { // <= stating we are expecting an error when `len(expBzzAddrs)` is not nill (I usually try to aoid `err == nil` but I think in this case it fits.
t.Fatalf("expected timeout, got no error")
}
timeoutMessage := "exchange #1 \"trigger subPeersMsg and expect peersMsg\": timed out"
if err.Error() != timeoutMessage {
t.Fatalf("expected timeout, got %v", err)
}
}
Note:
- Since the function scope is big 100+ lines, I use
if...else
to press that I'm checking related stuff, i.e., I'm still making decisions based onlen(expBzzAddrs)
not something else coming from the previous lines.
* eliminate *testing.T argument from bzz/hive protocoltesters * add sorting (only runs in test code) on peersMsg payload * add random (0 to MaxPeersPerPO) peers for each po * add extra peers closer to pivot than control
…eyond connection's proximity order (#19326) * swarm/network: fix hive bug not sending shallow peers - hive bug: needed shallow peers were not sent to nodes beyond connection's proximity order - add extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange - modify bzzProtocolTester to allow pregenerated overlay addresses * swarm/network: attempt to fix hive persistance test * swarm/network: fix TestHiveStatePersistance (#1320) * swarm/network: remove trace lines from the hive persistance test * address PR review comments * swarm/network: address PR comments on TestInitialPeersMsg * eliminate *testing.T argument from bzz/hive protocoltesters * add sorting (only runs in test code) on peersMsg payload * add random (0 to MaxPeersPerPO) peers for each po * add extra peers closer to pivot than control
The current swarm hive would in some occasions not connect to peers in shallow bins causing unhealthy connectivity
The reason is that needed shallow peers are not sent to nodes beyond connection's proximity order.
The initial response to subPeersMsg needs to be modified to allow enumeration of peers up to the advertised depth not the connection PO.
This PR introduces a one-liner fix to this bug.
It also adds extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange
It modifies bzzProtocolTester to allow pregenerated overlay addresses