Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm/network: hive bug: needed shallow peers are not sent to nodes beyond connection's proximity order #19326

Merged
merged 6 commits into from
Apr 2, 2019

Conversation

zelig
Copy link
Contributor

@zelig zelig commented Mar 25, 2019

The current swarm hive would in some occasions not connect to peers in shallow bins causing unhealthy connectivity
The reason is that needed shallow peers are not sent to nodes beyond connection's proximity order.
The initial response to subPeersMsg needs to be modified to allow enumeration of peers up to the advertised depth not the connection PO.

This PR introduces a one-liner fix to this bug.
It also adds extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange
It modifies bzzProtocolTester to allow pregenerated overlay addresses

@zelig zelig self-assigned this Mar 25, 2019
@zelig zelig requested review from holisticode, frncmx and nolash March 25, 2019 07:59
@zelig zelig force-pushed the handle-subpeersmsg branch 2 times, most recently from c5c5c0d to 435c5f6 Compare March 26, 2019 06:05
Copy link
Contributor

@frncmx frncmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the PR lacks in-progress or ready for review labels, but a review has been requested from me I decided to give one. Please, forgive me if I came too early.

// the first time this is received we send peer info on all
// our connected peers that fall within peers saturation depth
// otherwise this depth is just recorded on the peer, so that
// subsequent new connections are sent iff they fall within the radius
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'iff' => 'if'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iff means if and only if

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: then I would prefer 'if and only if' because that requires no knowledge of the abbreviation; that has to be written just once and read many times

swarm/network/discovery.go Outdated Show resolved Hide resolved
swarm/network/discovery.go Show resolved Hide resolved
swarm/network/protocol_test.go Outdated Show resolved Hide resolved
swarm/network/protocol_test.go Outdated Show resolved Hide resolved
swarm/network/discovery_test.go Outdated Show resolved Hide resolved
swarm/network/discovery_test.go Outdated Show resolved Hide resolved
swarm/network/discovery_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@holisticode holisticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should have in its description why it makes #19274 obsolete

}

// pivotDepth is the advertised depth of the pivot node we expect in the outgoing subPeersMsg
pivotDepth := hive.saturation()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a compelling explanation why this is the correct function to use for this test and not hive.depth. And why is hive.depth accessible at all? Why was NeighbourhoodDepth() wrong and/or producing flaky results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic. Polemic on the ambiguous naming aside.
saturation is the depth that the peer is advertising, basically, the shallowest bin, that has less than MinBinSize connections


// now we need to wait until the tester's control peer appears in the hive
// so the protocol started
ticker := time.NewTicker(10 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deem the solution in the other PR with a channel to be more elegant than this polling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think your channel is closed before the protocol run function is called so it is too early

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't follow your reasoning here:

	protocol := func(p *p2p.Peer, rw p2p.MsgReadWriter) error {
		...		
		bzzAddr := preConns[n]
		// pop the address from the "stack"
		preConns = preConns[:n]
		// if there are no more addresses left, inform the caller
		if len(preConns) == 0 {
			waitC <- struct{}{}
		}		

The channel is "closed" inside the protocol function?

swarm/network/discovery_test.go Outdated Show resolved Hide resolved
swarm/network/protocol_test.go Show resolved Hide resolved
defer mu.Unlock()
nodeToAddr[p.ID()] = addrs[0]
bzzAddr := &BzzAddr{addrs[0], []byte(p.Node().String())}
addrs = addrs[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying my suggestion, If you pass a chan to newBzzBaseTesterWithAddrs, then you could just close that channel when all addresses have been assigned instead of polling on the caller.

Also. len(addrs) will always be one, so maybe we don't need to pass a [][]byte, but I guess you want to keep it generic?

swarm/network/discovery_test.go Outdated Show resolved Hide resolved
zelig and others added 5 commits March 28, 2019 03:57
-  hive bug: needed shallow peers were not sent to nodes beyond connection's proximity order
- add extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange
- modify bzzProtocolTester to allow pregenerated overlay addresses
@zelig zelig force-pushed the handle-subpeersmsg branch from 422f3ca to a89902e Compare March 28, 2019 03:01
Copy link
Contributor

@frncmx frncmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just one major concern: the potential resource leak. However, if I'm right on that I think all the calls for NewProtocolTester() share the problem, we might defer that fix and create an issue.

All other comments are minor with the addition that, please, drop the unused *testing.T param.

Don't block on me.

swarm/network/discovery_test.go Show resolved Hide resolved

// for values MaxPeerPO < peerPO < MaxPO the pivot has no peers to offer to the control peer
// in this case, no peersMsg will be sent out, and we would run into a time out
if len(expBzzAddrs) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I still hold the intuition that maybe this test tries to test too many things at once. (clean code: test one concept per test, ideally with one assert) But I let that go. On the other hand, please consider my version for the same logic. What do you think? Is it simpler?

  • I put the simplest case first, so I can "forget" about that quickly. (smaller working memory)
  • I did not like the return statement was 3 indents deep.
if len(expBzzAddrs) > 0 {
  if err != nil {
    t.Fatal(err)
   }
} else {
  if err == nil {   // <= stating we are expecting an error when `len(expBzzAddrs)` is not nill (I usually try to aoid `err == nil` but I think in this case it fits.
    t.Fatalf("expected timeout, got no error")
  }
  
   timeoutMessage := "exchange #1 \"trigger subPeersMsg and expect peersMsg\": timed out" 
   if err.Error() != timeoutMessage {
      t.Fatalf("expected timeout, got %v", err)
   } 
}

Note:

  • Since the function scope is big 100+ lines, I use if...else to press that I'm checking related stuff, i.e., I'm still making decisions based on len(expBzzAddrs) not something else coming from the previous lines.

swarm/network/discovery_test.go Outdated Show resolved Hide resolved
swarm/network/hive_test.go Show resolved Hide resolved
 * eliminate *testing.T argument from bzz/hive protocoltesters
 * add sorting (only runs in test code) on peersMsg payload
 * add random (0 to MaxPeersPerPO) peers for each po
 * add extra peers closer to pivot than control
@zelig zelig merged commit 0529015 into ethereum:master Apr 2, 2019
@frncmx frncmx deleted the handle-subpeersmsg branch April 11, 2019 09:26
nonsense referenced this pull request in ethersphere/swarm May 8, 2019
…eyond connection's proximity order (#19326)

* swarm/network: fix hive bug not sending shallow peers

-  hive bug: needed shallow peers were not sent to nodes beyond connection's proximity order
- add extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange
- modify bzzProtocolTester to allow pregenerated overlay addresses

* swarm/network: attempt to fix hive persistance test

* swarm/network: fix TestHiveStatePersistance (#1320)

* swarm/network: remove trace lines from the hive persistance test

* address PR review comments

* swarm/network: address PR comments on TestInitialPeersMsg

 * eliminate *testing.T argument from bzz/hive protocoltesters
 * add sorting (only runs in test code) on peersMsg payload
 * add random (0 to MaxPeersPerPO) peers for each po
 * add extra peers closer to pivot than control
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants