Dialing a multiaddr should dial it first #451
Labels
exp/expert
Having worked on the specific codebase is important
kind/bug
A bug in existing code (including security flaws)
P1
High: Likely tackled by core team if no one steps up
status/ready
Ready to be worked
The Problem
Currently in libp2p if you dial a
Multiaddr
, or its string version, thePeerInfo
record for that peer will be retrieved and then theMultiaddr
will be added to its multiaddr list before dialing. If the peer already has known addresses, theMultiaddr
originally provided might not be dialed first.For example, let's say I am trying to manually connect to an IPFS preload node,
/dns4/node0.preload.ipfs.io/tcp/443/wss/ipfs/QmZMxNdpMkewiVZLMRxaNxUeZpDUb34pWjZ1kZvsd16Zic
. If I have no information about this node, this address will be dialed, as its the only address.Now, if I have been on the network for a while, or have previously connected to this node, I will likely have some additional addresses, such as:
The problem here, is that the dialing logic causes us to iterate over our available transports and the target peers Multiaddrs. The order of this dial is currently determined by the order our transports were added, and the order of their Multiaddrs. Most likely, this will be the first
TCP
address in the list, assuming we support TCP as it is often added to configuration first. Instead of dialing/dns4/node0.preload.ipfs.io/tcp/443/wss
we would first dial/ip4/127.0.0.1/tcp/4001
. This may fail and we will continuing trying other addresses, however, if we happen to have a node running locally on this address, such as go-ipfs, we would connect and then secio would terminate the connection with the error dialed to the wrong peer, Ids do not match as the Peer dialed is not the target Peer. Since the connection has already been "successfully" dialed, no other addresses are currently being tried. Ideally, we should try all addresses until we have a full, encrypted connection.Solutions
Here are some ways to improve dialing to avoid this:
Async/Await migration
With the async/await migration we will be getting proper support for aborting dials. This will allow us to dial multiple addresses in parallel, use the first/fastest connection, and then abort the rest. Running all the addresses in parallel (to a reasonable limit), we could ignore the failures we might get by accidentally dialing the wrong, local peer.
The text was updated successfully, but these errors were encountered: