-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nobody can ping anybody, including netmaker #528
Comments
Hi Ethan, I'm noticing a couple things. First of all, are your security groups open? You need to make sure none of the ports are blocked by AWS Second, try using the docker-compose.contained.yml. That's what we're using for quick starts now. |
In the meantime, out of desperation, I created a brand new EC2 instance, and ran through the "quick install" instructions again. This led to LetsEncrypt rate limits being met. To get around this, I created new subdomains, and modified my docker-compose.yml URLs to point at the new subdomain. It is reachable, I can log in, create networks, but when I try to join a network using the docker method:
at my wits' end here. This must be my 10th attempt to get netmaker up and running stably, and I used to be a systems engineer designing IP cryptos, am now a software engineer. I (broadly) know what I'm doing. |
In case it sheds any light, 1 out of 3 clients can join the network, the other two give that TLS error above. |
It still looks like you are not using the "contained" docker-compose (docker-compose.contained.yml). The TLS issue likely occurs from re-using the same domain with Caddy. If you run the install multiple times with the same subdomain, Caddy will start failing to automatically generate certs. It also looks like there are artifacts of your installation in the docker client. ALREADY_INSTALLED indicates there are already entries in /etc/netclient and it will re-use those entries. So worth removing that folder (and related interfaces, if they exist) before moving on. One last note, your security groups look correct, but for an install, I would just use the "quick-install" command from the readme, I think that will do a better job of setting up correctly. |
I am using docker-compose.contained.yml from master; I replaced the placeholder variables and added Google OAuth. As I said above, I have moved to a new domain (nm.domain.com instead of netmaker.domain.com) and all of the subdomains are resolving correctly, dashboard working etc. Your point about docker artefacts has helped; I had checked that there were no docker volumes persisting on the client, but failed to notice the mapping to the host machine /etc/netclient. Having run |
Ok, so, back to original issue; fresh install on a fresh EC2 instance, as per the quick install instructions, taking docker-compose.contained.yml from master and replacing variables as instructed, AWS security group set up as instructed. Nodes can connect to netmaker server, join networks, and ping netmaker on its network IP. Weirdly, netmaker cannot ping other nodes. Nodes cannot ping each other. DNS is also not working. |
Ok, I can ping other nodes from inside the
hangs. |
yup, that's what the 'contained' version does. It confines networking to the container, which keeps things a lot cleaner on the host. If you need to use the host as a bastion, the best option is to deploy an additional client. If nodes are unable to reach each other, the most likely scenario is that the UDP hole punching feature is unable to get the correct addresses. Though in this case, nodes are typically unable to reach the server. I would check "wg show" on the nodes and see if there is a handshake with the other nodes. If there is a handshake but you cannot ping, you may need to reduce MTU. If there is no handshake, try "netclient pull -n ". If that doesnt work, you may need to try a network with UDP hole punching turned off. |
Thanks for the response. For context, pre 0.9.0, all of these machines talked fine, but there was a memory leak that kept crashing my EC2 instance, hence the upgrade. My point is: MTUs and hole punching were fine. I would also point out that a single-ping packet is 84 bytes all-in - how are we expecting the addition of a wireguard encapsulation to take a packet over the MTU? So I think I'm seeing two different failure modes.
|
I have just created (another) new EC2 instance and run the nm-quick.sh install script with the domain and email arguments, and ended up with exactly the same symptoms as above. |
So, doing some more testing this morning, it seems the problem is the docker client, i.e. setting up a client with:
Results in the symptoms above, namely:
If I use the non-docker Linux standard:
I can ping even remote clients, and even DNS works (although only on 20.04 clients, not 18.04, but that's for a separate issue). So I think the response to this issue should be an investigation into the dockerised client. I am still in the situation wherein my EC2 instance goes to full CPU and stops responding after a couple of days; possibly a memory leak, but EC2 doesn't log memory usage, so I need to collect more data before raising an issue. |
Seeing this same behavior while trying to run Installing with the script results in a working setup of |
some more digging, seems like missing route table entries prevent the ping from going thorough. I have 2 peers with
Running
|
@hagaibarel That is a good find. What OS are you running? I have noticed this issue, and it seems to be dependent on linux OS. Was thinking we should just add in a few lines of code after bringing up the interface to confirm the interface gets created. |
I'm running
|
Btw, I still don't have DNS setup properly, I guess it might be a different issue but the two seem related. Running
And while using the installation script the settings were inserted correctly |
Some more info, I've deleted the daemonset and removed the wireguard settings and netclient config ( So it seems that this Still no DNS... |
I've opened #540 for further discussion on DNS issues |
fixed route issue in 0.9.2 |
Hi, standard AWS setup as per the docs, on an EC2 Micro 20.04.2 instance. DNS, dashboard etc. are working. Tunnels are up, but no-one can ping anyone. Even on the netmaker server:
My docker-compose.yml:
Caddyfile
The text was updated successfully, but these errors were encountered: