Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autorelay #454

Merged
merged 36 commits into from
Nov 6, 2018
Merged

autorelay #454

merged 36 commits into from
Nov 6, 2018

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Oct 17, 2018

Implements autorelay; Closes #444.

Depends on:

TBD:

  • gx import discovery
  • gx import autonat
  • gx update multiaddr-net
  • Update New constructor to automatically create relay/autorelay/routed hosts
  • tests!

@ghost ghost assigned vyzo Oct 17, 2018
@ghost ghost added the status/in-progress In progress label Oct 17, 2018
@vyzo vyzo requested a review from Stebalien October 17, 2018 14:10
@vyzo
Copy link
Contributor Author

vyzo commented Oct 17, 2018

cc @whyrusleeping @mgoelzer

@vyzo
Copy link
Contributor Author

vyzo commented Oct 18, 2018

rebased on master

@ghost
Copy link

ghost commented Oct 18, 2018

cc @dignifiedquire
Fyi this is in progress now, following the recent NAT-ing problems.

@vyzo vyzo requested review from bigs, raulk and magik6k October 19, 2018 09:05
Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, but I'm not exactly into this code

options.go Outdated
func Routing(rt config.RoutingC) Option {
return func(cfg *Config) error {
if cfg.Routing != nil {
return fmt.Errorf("cannot specified multiple routing options")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/specified/specify

h.mx.Unlock()

limit := 20
for ; need > limit; limit *= 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like there should be a better way to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could just take some factor of need (say 2x) and go from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified in f495b8a

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite familiar with this part of libp2p, so this should probaby get at least some more review from someone who is.

@ghost
Copy link

ghost commented Oct 19, 2018

@vyzo, this is coming along really nicely. Before final merge, can you add some docs about how it works with usage examples for downstreams? Also, I’m thinking we should get a reviewer from Filecoin to verify a build off this branch fixes the problem they were having that raised the priority of this? (or if it doesn’t, we should open new issues to cover what else they need for their use case) Maybe @phritz?

@vyzo
Copy link
Contributor Author

vyzo commented Oct 20, 2018

We still need to bubble up some recent changes in multiaddr-net before we can merge, but that's just procedural at this point.

Re: documentation
It all works through the libp2p constructor.
If you construct a host with relay enabled (now the default) and supply the new Routing option, then it will instantiate an autorelay host that will discover relays through routing and advertise relay addrs when the presence of impenetrable NAT is detected.
If you enable relay with OptHop, which is the option to construct an actual relay, then it will construct a relay host that advertises relay capabilities through the routing object.
The Routing option takes a constructor function that builds a routing instance given a host; this can simply be an instance of IpfsDHT, but anything that implements the routing interface is acceptable.

Note that an autonat service instance is needed to detect the presence of NAT -- see libp2p/go-libp2p-autonat-svc#1 (which is also waiting for the bubble up before merge).

So, in a nutshell:

  • you need to instantiate an AutoNAT service somewhere in the network, and connect to it so that it is discovered by the system (this will probably run on the bootstrappers)
  • the actual relays need to be constructed with libp2p.New(libp2p.EnableRelay(circuit.OptHop), libp2p.Routing(...))
  • the autorelayed hosts need to be constructed with libp2p.New(libp2p.Routing(...))

@vyzo vyzo added the status/blocked Unable to be worked further until needs are met label Oct 20, 2018
@vyzo
Copy link
Contributor Author

vyzo commented Oct 20, 2018

So just a status update: the code is pretty much ready, but needs some further review (poking @Stebalien @raulk @bigs).
We are blocked on merging however, because we are waiting for the changes in go-multiaddr-net to bubble up.

@vyzo vyzo changed the title [WIP] autorelay autorelay Oct 20, 2018
@vyzo vyzo added the P1 High: Likely tackled by core team if no one steps up label Oct 20, 2018
@vyzo
Copy link
Contributor Author

vyzo commented Oct 22, 2018

I think we want to rename the config.Routing interface to BasicRouting and type alias it in the libp2p top module (for better usability).
Maybe we should define it in go-libp2p-routing though, it's the combination of PeerRouting and ContentRouting.

@phritz
Copy link

phritz commented Oct 23, 2018

@vyzo I think @mgoelzer was proxying the request from go-filecoin around docs. Basically when we went to try to implement relaying ourselves we spent a lot of time trying to figure out how to do it. We had confusion around:

  • semantics of constructor flags (hop vs relay what to configure where -- do we need both hop and relay on the natted peer or just relay? what does "active" mean and do we need to set it?)
  • whether we needed to advertise relay addresses manually and how to do that (yes use sprintf to do it manually and use addrsfactor -- which precipitated "how do you use addrsfactory?")
  • what format the relay address should have
  • whether we should be using /p2p or /ipfs, the docs are inconsistent
  • whether we needed to manually keep connected to relay nodes open (i think yes)
  • how relaying interacted with dht
  • if we have multiple relay nodes how/if peers could find the right one to connect through (i think you encode the relay node you are connected to in the address that you advertise)
  • whether we should suppress the non-relayed addresses
  • whether the ten zillion lines of errors that libp2p logs were actual errors (thing X that should not happen did happen) or whether they were really more like warnings that are to be expected in the course of events and we could ignore (eg, couldn't connect to a host, connection reset, etc)

We ended up spending many hours trying to get this working, even with the benefit of early input from why and stebalien, and relied mostly on trying to find places in the code that used this stuff and trial and error.

It would be great to have a little more explanation of how this works (#454 (comment) is a great start -- promote it to documentation!) along with some sample code. The sample code would be most helpful! This is the case we were after:

  • I want natted peers to be able to talk to each other via a fixed set of relay nodes known ahead of time. I also want not-natted peers to be able to talk to all peers, ideally we dont have to relay connections for peers who can speak directly to each other.

The case of talking to each other via a set of discoverable relay nodes is a nice to have, as is auto-detection of whether my peer is reachable or needs to use a relay (I think you have this WIP).

Docs we used:

@vyzo
Copy link
Contributor Author

vyzo commented Oct 23, 2018

I want natted peers to be able to talk to each other via a fixed set of relay nodes known ahead of time. I also want not-natted peers to be able to talk to all peers, ideally we dont have to relay connections for peers who can speak directly to each other.

The way this will work with autorelay:

  • You will need one or more autonat-svc instances in the network, preferrably in the bootstrappers.
    These are used by the autorelay'ed hosts to detect the presence of NAT before they start advertising relay addrs.
  • You need to construct the relays with libp2p.New(libp2p.EnableRelay(circuit.OptHop), libp2p.Routing(makeDHT))
  • You need to construct the hosts with libp2p.New(libp2p.Routing(makeDHT))

There is no need to have the fixed relays known to the end hosts ahead of time, they will be discovered through the DHT.
NATed nodes will automatically start advertising relay addrs, and keep their connections to relays open automatically. This will take a little while, about a couple of minutes, before the relay advertisements start.
Note that you will have to update all nodes (esp the ones running as dht nodes) to accept identify push. This is a new variant of the identify protocol introduced in this PR, which allows us to dynamically update our peers about advertised addrs.

@vyzo
Copy link
Contributor Author

vyzo commented Oct 23, 2018

rebased on master

@vyzo
Copy link
Contributor Author

vyzo commented Oct 23, 2018

I will add a doc.go in p2p/host/relay documenting the usage of autorelay.

edit: done in 326dc68

@ghost
Copy link

ghost commented Oct 23, 2018

@vyzo Thanks for adding the docs PR, and @phritz thanks for engaging in this discussion to clarify what the questions are.

Long term, I'd like us to find a way that each major PR has documentation independent of its source code files or Github PR comments page. We could either house this in a single location like https://docs.libp2p.io (which we could drive off GH pages or something) or we need to put it in each module's repo (e.g., https://github.com/libp2p/go-libp2p-circuit/blob/master/README.md). We can hire a professional docs writer to clean up the writing and index it somewhere, as long as we capture the facts somewhere.

Since docs.libp2p.io is blocked on infra, my suggestion for now would be to take the discussion here and use it to fill in https://github.com/libp2p/go-libp2p-circuit/blob/master/README.md. Would that be workable?

Copy link
Contributor

@bigs bigs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really great. some small feedback, some of which i suspect can be ignored. generally thrilled with how readable/grokable this all is.

p2p/protocol/identify/id.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Show resolved Hide resolved
}

dctx, cancel := context.WithTimeout(ctx, 60*time.Second)
pis, err := discovery.FindPeers(dctx, h.discover, "/libp2p/relay", limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's factor this magic string into a constant (should it be versioned?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do. I don't think we need a version for this, relay is pretty fixed functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

called it RelayRendezvous and it's a constant now.


// TODO better relay selection strategy; this just selects random relays
// but we should probably use ping latency as the selection metric
shuffleRelays(pis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we factor this out into a RelaySelection interface or something similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it could be a function passed in at configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more like ordering of relays. I don't think I want to make this configurable, just make it smarter and use ping latency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the relay selection out of line, into a selectRelays function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, I think we also need to take into account relay load in the selection strategy.
The selection metric could be latency * load, and select to minimize.
But this will need some changes in the relay protocol (ie ability to measure load and report that), so I am going to leave it as a big fat TODO and stay with randomized selection for initial deployment.

p2p/host/relay/autorelay.go Show resolved Hide resolved
@vyzo
Copy link
Contributor Author

vyzo commented Oct 24, 2018

Since docs.libp2p.io is blocked on infra, my suggestion for now would be to take the discussion here and use it to fill in https://github.com/libp2p/go-libp2p-circuit/blob/master/README.md. Would that be workable?

@mgoelzer Sure, but let's merge first. Then we can add pretty comprehensive documentation there.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments I'd like addressed.

)

const (
RelayRenezvous = "/libp2p/relay"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


func (h *AutoRelayHost) background(ctx context.Context) {
select {
case <-time.After(autonat.AutoNATBootDelay + BootDelay):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need an event bus. These cascading time bombs make me nervous. AutoRelay depends on AutoNAT, which in turn depends on Identify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..shrug :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we do eventually need a better solution. Having to know that if you try doing X before Y finishes means it will fail is kinda rough.

p2p/host/relay/autorelay.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Show resolved Hide resolved
p2p/host/relay/autorelay.go Outdated Show resolved Hide resolved

// add relay specific addrs to the list
for _, pi := range h.relays {
circuit, err := ma.NewMultiaddr(fmt.Sprintf("/ipfs/%s/p2p-circuit", pi.ID.Pretty()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use /p2p which is equivalent for /ipfs in multiaddr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

for _, addr := range pi.Addrs {
if !manet.IsPrivateAddr(addr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow this logic. The resulting self addresses via relays would end up taking the form: /ipfs/QmRelay/p2p-circuit/{maddr of *relay* if not private}

Are we sure we want to encapsulate the relay's addrs instead of our own?

Copy link
Contributor Author

@vyzo vyzo Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing after the p2p-circuit part, it's just a transport signifier.
The relayed multiaddr will be of the form /public-maddr-of-relay/ipfs/QmRelay/p2p-circuit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear, it's the non-private address of the relay that is encapsulating the /p2p/QmRelay/p2p-circuit part, not the other way around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

}

// and the actual test!
func TestAutoRelay(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be a stickler for testing, but I think a component like this deserves more exhaustive testing at the time of its submission of the PR, or at least a commitment from the author to add more tests shortly thereafter.

Particularly I'd like to see coverage regarding:

  • our actual own advertised addresses as a result of autorelay (we are only testing that there's a P_CIRCUIT component, but we should be testing it's entirety, especially because we're manipulating maddrs)
  • behaviour with unspecific relays.
  • multiple rounds of autorelay.
  • timeouts.
  • different AutoNAT statuses.
  • connection terminations.
  • etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is overtesting stuff.

Copy link
Contributor

@bigs bigs Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good use case for a testbed/daemon environment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would very much see these integration tested in a testbed than having to write reams of borderline useless testing code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't tell you how many times what appeared to be overtesting at the time caught regressions later on ;-)

// verify that we advertise relay addrs
haveRelay := false
for _, addr := range h3.Addrs() {
_, err := addr.ValueForProtocol(circuit.P_CIRCUIT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assert the entire multiaddr, not only that a component exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a lot of work that replicates the logic in the code, not sure I see the point (it is tested for dialability later on).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't realise it was tested for dialability 👍

@vyzo
Copy link
Contributor Author

vyzo commented Nov 4, 2018

rebased on master and updated deps.

@raulk
Copy link
Member

raulk commented Nov 5, 2018

I’m assuming autorelay is critically urgent for downstream users. As long as we explicitly state in godoc that these interfaces are UNSTABLE, I can live with merging now and revisiting once the service abstraction is ready.

@vyzo vyzo merged commit 3e2dc09 into master Nov 6, 2018
@ghost ghost removed the status/in-progress In progress label Nov 6, 2018
@vyzo vyzo deleted the feat/autorelay branch November 6, 2018 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High: Likely tackled by core team if no one steps up topic/filecoin Topic filecoin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants