NAT Connectivity and Discoverability Issues #2509

whyrusleeping · 2016-03-28T00:31:38Z

People have been noticing issues with connectivity through NATs lately. Lets use this issue to track those issues, and provide debugging information/tips/tricks.

whyrusleeping · 2016-03-28T00:32:06Z

Some tips i've posted in a different issue before:

first, note down the peer IDs of all your involved nodes (run ipfs id)

To check what peers a given node is connected to, run ipfs swarm peers and search for the peer IDs at the end of addresses for the ones youre interested in.

To check connectivity to a given node, i normally start at an ipfs node that i know has good connectivity (my vps normally) and run ipfs dht findpeer <PEERID> for the peer you're investigating. This should list out all addresses that the peer is advertising. If the public address is in that list (and you arent already connected to them) you can run ipfs swarm connect <ADDR> where ADDR is the entire /ip4/...../ipfs/QmPeerID

If you can successfully connect a node to the node with the data, you should be able to run an ipfs get to grab the data youre interested in.

If you connect and arent able to get the data, i would check ipfs dht findprovs <CONTENT HASH> and see if the network returns any records indicating who has that content. If your peer that has the data doesnt show up there, then something interesting is wrong (likely added the data while not connected to the dht). In that case, I would try re-adding the data on the node that already has it (this will trigger a rebroadcast of the provider records). After that compeltes wait a little bit (for the records to propogate) and try running the ipfs get again from the other (non data holding) node.

If you cant make a connection from an outside node to your node with the data, the next thing I would try is making a connection from the data node out to other peers, then try fetching the data on those other peers. If that works, then the issue lies entirely with NAT traversal not working. Ipfs does require some amount of port forwarding to work on NAT'ed networks (whether manual forwarding, NAT PMP or upnp).

whyrusleeping · 2016-03-28T01:26:44Z

i have thought about designing a NAT test lab, notes here: https://gist.github.com/whyrusleeping/a0ab8df68d1020df32c6

slothbag · 2016-03-28T22:11:50Z

I keep getting too many files open errors on my ipfs daemon.. not sure if this is related.

whyrusleeping · 2016-03-29T05:51:32Z

@slothbag hrm... getting the too many open files error will definitely cause issues with dht connectivity.

whyrusleeping · 2016-03-29T05:59:56Z

An irc user noted issues after seeing the mdns 'failed to bind unicast' error. Theres likely some correlation here.

whyrusleeping · 2016-03-29T16:26:26Z

The issue appears to be a file descriptor leak in the utp codebase (thanks for the tip @slothbag it really helped!)

A temporary workaround (while i'm working on an official fix) is to add a utp swarm addr to your swarm address config.

In ~/.ipfs/config (or $IPFS_PATH/config) locate the Addresses.Swarm list and add something like "/ip4/0.0.0.0/udp/4002/utp" to it.

After that value is set, restart your daemon and things should be better. If you continue to experience the same problems, please let me know ASAP

UPDATE: the utp code is disabled by default in recent versions of go-ipfs. This suggestion is no longer valid.

guruvan · 2016-03-29T21:04:38Z

so from our other beginning to this issue - a couple quick notes:

definitely a "for sure" is go-ipfs is having difficulty with determining correct interface addresses on my AWS machines - it's very very inconsistent how it's pulling in the AWS "public" ipv4 addresses when using this interface.

dht:

so far it looks like it's a problem with finding the correct route to the data from the "client" side, the ipfs add works fine AFAICT, but ipfs get from another node lags out.
- it seems as though it's an issue when I do ipfs get or ipfs pin but NOT when I simply curl the same data from the gateway interface

I was able to retrieve a 2.04GB data set (1.4GB file, 700MB file, a few others) with the following procedure:

remove all bootstrap swarm hosts
enter in only the hosts I KNOW possess the data (i.e. the host that ADDED it)
start ipfs
run ipfs get
wait for it to hang
kill ipfs
restart
ipfs get
we get to the spot where it hung previously
PAGES of tcp errors (from the WRONG IP addresses!)
the get resumes, and gets some more data
ipfs get hangs
13 kill ipfs and restart procedure

on host running ipfs get
bad addresses like:
- 127.0.0.1
- docker bridge interfaces
- other known "bad" ip addresses

some of these addresses are known to be from my own hosts, some clearly are not.
dht findprovs shows that there's bad dial attempts to get to peerIDs that are NOT me, but have a bad address - usually localhost
Without changing the swarm bootstrap, each restart would hang at the same point and never continue.

I'll have a little more time tomorrow to investigate further & maybe get some packet traces.

mitar · 2016-03-29T21:29:23Z

Is this a regression? Maybe going back in versions to see when it is starting?

slothbag · 2016-03-30T08:15:05Z

I did the utp config change and it appears to have fixed the issue.. nice find!

whyrusleeping · 2016-03-30T23:54:56Z

fixes to the utp lib have been merged into master, so pull the latest down and run make install (new gx deps).

Please let me know how things go.

guruvan · 2016-03-31T00:56:42Z

Added the utp change, added the port to my security groups.
With a node set with a single known peer of my own (didn't notice this or I'd have reset to default)

running in docker with --net host (on AWS)
hosts are rancheros - 2 docker daemons present
these examples are hosts in an AWS VPC, on a publicly exposed network, not subject to my NAT
curling a file the node doesn't have yet (of my own production/push to ipfs) seems "better" - still stalling out, but I've not reconfigured all my nodes with the utp fix yet
- 4 stops & restarts to get a 700mb file

Restarting this same node running ipfs get it stalled out immediately :)
Restarting this same node resetting the bootstrap peers to default + 1 known node of my own running ipfs get on it stalls repeatedly (on what appear to be problem blocks)

with repeated restarts, and slightly different configurations for API,Gateway, Swarm (but including the appropriate utp line) I had many different results regarding the inclusion, lack of inclusion of the AWS PUBLIC_IPV4 address.

at one point I saw this come up in the swarm addresses output form the daemon as it came up:
ip4/PUBLIC_IPV4/tcp14527
?? no idea where it could have gotten this port number
updated as I test
I noted this in checking findpeers from the "fresher node below to find the more problematic node above, and out of space and reset below

[rancher@rancher ~]$ docker exec r-ipfs_gw_1 ipfs dht findpeer QmYB8t7H2Z1xrwZ6fxAhfUg3UTFPGyq5Sg2WUkJZdkChZe 00:23:03.545: <peer.ID QmYB8t> /ip4/private_ipv4/tcp/4001 **/ip4/public_ipv4/tcp/14528** /ip4/public_ipv4/tcp/4001 /ip4/127.0.0.1/udp/4002/utp /ip4/127.0.0.1/tcp/4001 /ip4/system-docker/udp/4002/utp /ip4/system-docker/tcp/4001 /ip4/user-docker/udp/4002/utp /ip4/user-docker/tcp/4001 /ip4/private_ipv4/udp/4002/utp

where ever it's getting the port number noted above looks very troublesome to me - if that's expected to be an inbound-capable port....
really it looks to me as if I should be able to blacklist local ip addresses - i.e. the docker addresses, or any other addresses I prefer it to not listen on, or publish - no other node should ever talk to me on my docker addresses, localhost, etc. If I could simply set this to at most my public_ipv4, and private_ipv4 that would seem like it'd work better (without knowing too much of the internals of ipfs)

I finally ran this machine out of space (I'm presuming this is related to docker's handling of volumes, rather than ipfs's handling of data) - and blew up the ipfs dir. :)

Starting with a fresh config on the same host
and updating my other nodes to include a utp swarm address I ran ipfs get on this data, and let it run while

I moved onto a "fresher" node (fresh IP addresses, fresh config file, no data downloaded)

one quick test without the utp change seems to have stalled out
a restart with the utp change was instantaneously better
this node was able to download almost the entire 2GB dataset in one try

There's not any perceptible pattern to log messages when the get operation appears to "stall out" other than what I've noted above

by stall out, I really mean wait several minutes (up to 15-20) for the get operation to show any progress
as I've progressed with the testing, I've added more specific timeouts (up to 30min)

I'll be adding more nodes shortly, all with fresh ip addresses. I'll whip up an updated docker image in the morning from master.
@whyrusleeping does my new image need to add the utp line to the config or is this also updated in master?

slothbag · 2016-04-01T02:29:36Z

Updated local and remote IPFS node with latest UTP fixes. Local node is behind NAT but has port forwarding for IPFS.

I dont seem to get the "too many files in use" error anymore, however the discover-ability is still not working. I have been trying to pin a object for an hour and it cant find it.

Problem still exists.

whyrusleeping · 2016-04-01T03:55:59Z

@guruvan the 'stalling out' is 'no data at all received for a long time' right? not, 'received some data and then hung'.

If thats the case, then its an issue with discoverability/connectivity (which i think is the problem).

@slothbag in this case, can you discover valid addressed from the NAT'ed node from a node outside the NAT? (ipfs dht findpeer <peer id of NATed node>)

slothbag · 2016-04-01T04:42:19Z

ipfs dht findpeer returns a list if ip addresses... a mixture of my LAN ip and my external IP, but the correct incoming port is on the LAN ip and all the external IPs have incorrect ports.

whyrusleeping · 2016-04-01T04:51:32Z

@slothbag thats awesome information for me to have, thank you!

mikhail-manuilov · 2017-09-27T12:47:20Z

Why can't ipfs use same method for determine external IP like for example parity: --nat extip:.
https://github.com/paritytech/parity/wiki/Configuring-Parity ? Current version at the moment simply ignores IP in "Addresses" array.

ghost · 2017-10-26T00:10:57Z

Why can't ipfs use same method for determine external IP like for example parity: --nat extip:.
https://github.com/paritytech/parity/wiki/Configuring-Parity ? Current version at the moment simply ignores IP in "Addresses" array.

@mikhail-manuilov Do you mean you specified your external IP in Addresses.Swarm? That'd currently only work if there's a network interface on your local machine that has that IP address.

The Addresses.Swarm setting is only for the addresses to listen on - instead you can explicitly set addresses to be announced to the network in Addresses.Announce.

Kubuxu · 2017-10-26T13:39:47Z

Here is other issue with connectivity I have observed at my house.
I am behind carrier grade NAT and then my local NAT.
After starting go-ipfs it connects to one bootstrap node and that is it. Randomly I have found that disabling reuseport (IPFS_PREUSEPORT=false) "fixes" it. Fixes as in: now I can dial out, people still can't dial to me (NAT it too strong).

So if someone has problems with dialing out, disabling reuseport might help.

whyrusleeping · 2017-10-26T15:23:12Z

It would be great to add tests for this sort of thing to https://github.com/whyrusleeping/natest I need to spend more time working on that tool, but it should help us to diagnose these things.

…

On Oct 26, 2017, 2:39 PM +0100, Jakub Sztandera ***@***.***>, wrote: Here is other issue with connectivity I have observed at my house. I am behind carrier grade NAT and then my local NAT. After starting go-ipfs it connects to one bootstrap node and that is it. Randomly I have found that disabling reuseport (IPFS_PREUSEPORT=false) "fixes" it. Fixes as in: now I can dial out, people still can't dial to me (NAT it too strong). So if someone has problems with dialing out, disabling reuseport might help. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Kubuxu · 2017-10-26T15:33:19Z

I tried to use this tool right now, seems quite broke.

whyrusleeping · 2017-10-27T12:24:22Z

@Kubuxu I fixed the issue you reported, thanks! mind trying again?

Kubuxu · 2017-11-01T09:47:46Z

{
  "OutboundHTTP": {
    "OddPortConnection": "",
    "Port443Connection": ""
  },
  "Nat": {
    "Error": null,
    "MappedAddr": "/ip4/0.0.0.0/tcp/38044"
  },
  "HavePublicIP": false,
  "Response": {
    "SeenAddr": "/ip4/87.239.222.9/tcp/6812",
    "ConnectBackSuccess": false,
    "ConnectBackMsg": "dial attempt failed: \u003cpeer.ID Pah1CN\u003e --\u003e \u003cpeer.ID TwQRCH\u003e dial attempt failed: connection refused",
    "ConnectBackAddr": "",
    "TriedAddrs": [
      "/ip4/127.0.0.1/tcp/40941",
      "/ip4/0.0.0.0/tcp/38044",
      "/ip4/87.239.222.9/tcp/40941"
    ]
  },
  "Request": {
    "PeerID": "QmTwQRCHoF34HamrcfAQx9rti3AM127hKr6MGrzvBnxBoM",
    "SeenGateway": "",
    "PortMapped": "/ip4/0.0.0.0/tcp/38044",
    "ListenAddr": "/ip4/127.0.0.1/tcp/40941"
  },
  "TcpReuseportWorking": false
}

whyrusleeping · 2017-11-01T12:50:26Z

@Kubuxu hah, you can see it's a double NAT, you succeeded in mapping a port, but the connect back still failed.

…

On Wed, Nov 1, 2017, 4:47 AM Jakub Sztandera ***@***.***> wrote: { "OutboundHTTP": { "OddPortConnection": "", "Port443Connection": "" }, "Nat": { "Error": null, "MappedAddr": "/ip4/0.0.0.0/tcp/38044" }, "HavePublicIP": false, "Response": { "SeenAddr": "/ip4/87.239.222.9/tcp/6812", "ConnectBackSuccess": false, "ConnectBackMsg": "dial attempt failed: \u003cpeer.ID Pah1CN\u003e --\u003e \u003cpeer.ID TwQRCH\u003e dial attempt failed: connection refused", "ConnectBackAddr": "", "TriedAddrs": [ "/ip4/127.0.0.1/tcp/40941", "/ip4/0.0.0.0/tcp/38044", "/ip4/87.239.222.9/tcp/40941" ] }, "Request": { "PeerID": "QmTwQRCHoF34HamrcfAQx9rti3AM127hKr6MGrzvBnxBoM", "SeenGateway": "", "PortMapped": "/ip4/0.0.0.0/tcp/38044", "ListenAddr": "/ip4/127.0.0.1/tcp/40941" }, "TcpReuseportWorking": false } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2509 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABL4HB87tw3Ia7ZrjDGSiSWsmljxQcBuks5syD5EgaJpZM4H5k3r> .

Kubuxu · 2017-11-01T12:58:51Z

Yup, interesting thing is that with REUSEPORT I think I might be able to dial only once from that port.

Kubuxu · 2017-11-01T12:59:58Z

I have an option to buy external IP from my ISP but I am not doing it on purpose until we successfully recreate setup like this elsewhere.

This was referenced Mar 28, 2016

'Making your own ipfs service' example does not work #2401

Closed

DHT issues right now #2507

Closed

whyrusleeping mentioned this issue Mar 30, 2016

update utp dep #2515

Merged

em-ly added the kind/bug A bug in existing code (including security flaws) label Aug 25, 2016

whyrusleeping mentioned this issue Sep 13, 2016

Revisit Milestone 2 libp2p/libp2p#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAT Connectivity and Discoverability Issues #2509

NAT Connectivity and Discoverability Issues #2509

whyrusleeping commented Mar 28, 2016

whyrusleeping commented Mar 28, 2016

whyrusleeping commented Mar 28, 2016

slothbag commented Mar 28, 2016

whyrusleeping commented Mar 29, 2016

whyrusleeping commented Mar 29, 2016

whyrusleeping commented Mar 29, 2016 •

edited

Loading

guruvan commented Mar 29, 2016

mitar commented Mar 29, 2016

slothbag commented Mar 30, 2016

whyrusleeping commented Mar 30, 2016

guruvan commented Mar 31, 2016

slothbag commented Apr 1, 2016

whyrusleeping commented Apr 1, 2016

slothbag commented Apr 1, 2016

whyrusleeping commented Apr 1, 2016

mikhail-manuilov commented Sep 27, 2017

ghost commented Oct 26, 2017

Kubuxu commented Oct 26, 2017

whyrusleeping commented Oct 26, 2017 via email

Kubuxu commented Oct 26, 2017

whyrusleeping commented Oct 27, 2017

Kubuxu commented Nov 1, 2017

whyrusleeping commented Nov 1, 2017 via email

Kubuxu commented Nov 1, 2017

Kubuxu commented Nov 1, 2017

NAT Connectivity and Discoverability Issues #2509

NAT Connectivity and Discoverability Issues #2509

Comments

whyrusleeping commented Mar 28, 2016

whyrusleeping commented Mar 28, 2016

whyrusleeping commented Mar 28, 2016

slothbag commented Mar 28, 2016

whyrusleeping commented Mar 29, 2016

whyrusleeping commented Mar 29, 2016

whyrusleeping commented Mar 29, 2016 • edited Loading

guruvan commented Mar 29, 2016

mitar commented Mar 29, 2016

slothbag commented Mar 30, 2016

whyrusleeping commented Mar 30, 2016

guruvan commented Mar 31, 2016

slothbag commented Apr 1, 2016

whyrusleeping commented Apr 1, 2016

slothbag commented Apr 1, 2016

whyrusleeping commented Apr 1, 2016

mikhail-manuilov commented Sep 27, 2017

ghost commented Oct 26, 2017

Kubuxu commented Oct 26, 2017

whyrusleeping commented Oct 26, 2017 via email

Kubuxu commented Oct 26, 2017

whyrusleeping commented Oct 27, 2017

Kubuxu commented Nov 1, 2017

whyrusleeping commented Nov 1, 2017 via email

Kubuxu commented Nov 1, 2017

Kubuxu commented Nov 1, 2017

whyrusleeping commented Mar 29, 2016 •

edited

Loading