-
-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New peers are not propagated to already existing peers #2114
Comments
This is a very odd indeed. Assuming your connection with the management is fine this should never happen unless there is an manual update of the network map in the database or a async update causing the serial number to be lower. If you are running on 0.27.5+, can you enable debug logs on one of the existing clients with: then add a new peer and check if there is a log similar to:
|
It occured twice earlier this week but I cannot reproduce it since 🙄 When it happened I happened to test that when I updated ACL (I added an affected peer to some new, arbitrary group), netbird client of that peer noticed the ACL update request and as a side-effect it also did update the peer list. If it happens again I'll try to gather more detailed logs of all parties involved. |
sounds great, thanks @glaeqen |
@mlsmaycon Yes, the logs contain outdated NetworkMap, excerpt from the logs here:
I also have debug logs enabled on the server, let me know if there are any specific logs I should look for |
I deployed a custom version of the client on the affected server with added logs in
I'm suspecting a malformed message sends a high serial number (e.g INT MAX) and causes all valid messages that come after it to be less than the stored serial number, will keep this client running and come back to you with any results |
@mohamed-essam Recently, we switched the sync method to use read locks, which, for cases where a peer is often reconnecting in a small interval, could cause the network map serial number to be the same or lower in case of updates at the same time. the check and log is fine as we expect this to happen in some cases, but not often. To debug, maybe it is better to start printing the serial number received to confirm the max int case. Furthermore, we are improving the network map sync to prevent extra maps from being sent in these cases. See #2236 |
@mlsmaycon I can see the issue sometimes occurring where a client would print I made a custom version for the time being so it would restart the management connection every time it received and outdated NetworkMap or receives a wrongly addressed message when there's a networkmap locally (so it wouldn't restart management connection when it's not initialized yet), and it has been working perfectly for the past couple of days My custom patch is pushed here, not sure if this belongs in the mainstream yet, but it works. A more elegant way to do this would be implementing a manual Sync request so that it refreshes its network map forcibly if there's a chance it's out of sync, restarting the management connection does that but also does more than needed. |
Describe the problem
After adding a new peer to the network, this peer is not getting propagated to the already existing peers. At best only peers that were added very recently (last 24h?) seem to get the list of peers updated. The only thing that helps is to
netbird down
andnetbird up
on all the affected, exisiting peers machines which forces a peer refresh. Existing peers seem to show a following error after a new peer was addedThis feels so fundamentally wrong that it's hard to believe that it's a bug and not just my fault. I doubt it's a networking issue, affected peers are located both in the cloud and self-hosted networks so it does not seem probable.
Expected behavior
Automatic peer propagation
Are you using NetBird Cloud?
self-hosted NetBird's control plane
NetBird version
0.27.10
The text was updated successfully, but these errors were encountered: