-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 3005.1 minion does not close connections to port 4506 (most likely affects 3006 too) #64627
Comments
My expectation would be that it reuses the open connection, instead of closing it and opening another one. |
The point is it still reopens, just closes the previous one right before doing that which does not seem to be the smartest thing to do. Looking at all the PRs and diffs I have an impression that developers submitting them thought it would use one persistent connection but never bothered to actually check it. |
While modifying and testing the code I noticed tons of hanging connections to port 4506 on my MoMs even after modifying it to close these connections as Salt 3003 did. It turned out they were from syndics behind some restrictive firewalls. What happens is that zmq socket for RequestClient doesn't set any tcp keepalive, these connections get dumped by firewalls if there's not enough activity and Linux with typical settings is ignorant enough to just keep them as connected forever. For PublishClient connecting to 4505 there are at least tcp_keepalive setings available but they only apply to port 4505 connection. I strongly suggest that whomever had this idea to try and keep the conections to 4506 persistent (which didn't quite work out) really thinks of all the implications, including nonexistent tcp keepalive settings for that connection and resource usage on the masters. On masters with thousands of minions connected (already using up that many connections to publish port 4505) that's not something you can just ignore. Believe it or not, there are MoM-syndic installations reaching 100k minions in total out there (with a few code tweaks in my case but still...). To be honest I wouldn't be surprised if salt-master is not even capable of receiving multiple messages in the same 4506 socket (and I'm not talking return_pub_multi aggregation that syndics do - that was a massive improvement years ago btw!). |
Fixed by #65061 |
Description
3005.1 minion does not close connections to port 4506 (based on the current code most likely it affects 3006 too). As mentioned at the bottom of #64552 because of the dubious changes made to the minion code recently the minion keeps the connection to master port 4506 open after it sends a message. That connection is only closed right before it reopens it while sending another message which is a total misuse (and obviously that reopened connection is then left open until the cycle repeats). I elaborated more on it in #64552 (last few comments there).
To depict it with a metaphor, it works as if you always kept your front door wide open and only closed it when you go in or out just to leave it open again.
Setup
Any 3005.1 or 3006 setup should do to reproduce this behaviour.
Steps to Reproduce the behavior
Run "netstat -an | grep :4506" on a minion, run a job on it from the master and run netstat again. Observe a new connection kept in ESTABLISHED state and the previous one in TIME_WAIT. Use tcpdump and or iptables (with -j LOG) to watch the traffic, although in #64552 I explained at length why and how this happens.
Expected behavior
Connection to master port 4506 is closed after a message is sent.
The text was updated successfully, but these errors were encountered: