-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salt exiting prematurely #6881
Comments
@TempSpace We'll need more information to get to the bottom of this. Premature exits with no return data can be caused by tracebacks on the minion, so the minion log (default location: /var/log/salt/minion) would be a good place to look for tracebacks. If you could run |
I'll keep an eye out for them. If a traceback happens, is there logic to re-execute itself? Because when I went back and looked up the jid of the highstate, it always finished with all True states. |
OK, I just replicated the issue and there is nothing about any tracebacks in the log. Every line just says INFO, no warnings, errors, tracebacks or anything. What should I do next? |
I think the answer here is that there's something keeping the minion from responding to the find_job query in time for some reason. I think we need to make the timeout on that command configurable, and try a longer timeout value. |
Is there any more information I can provide on this one? It has become a huge issue for us. |
Have you tried running these commands with a higher timeoute? (-t 300 for example would not check with the minions for 5 minutes). I wonder if repeated calls to the minion to see if it's still running its job are the culprit. If the minion gets busy enough during the highstate that it misses one of those messages, or it doesn't reply quickly enough, I could see this happening. |
I've also created a new issue for making the secondary timeout (how long salt waits after checking in with the minion) configurable. If we bump that number up it should make this much more robust: #7354 |
I have. The higher timeout makes no difference at all unfortunately. I had a timeout of 5 minutes in the examples above. It failed at 1 minute and 52 seconds in my 2nd test regardless of the timeout. |
Wait, so your |
Correct, it didn't wait for the whole timeout. It returned absolutely nothing, it just exited with a status code 0. Running the minion in debug shows nothing but INFO lines, no errors, no tracebacks. If I run the same command again, it will almost always give me the expected output of a highstate. |
OK, here's a question: how are you targeting your minions? Are you using compound matcher or similar? Except the changes there didn't make it into 0.16.3 or 0.16.4, they're in 0.17......hrm..... |
I use different kinds. The examples above were specifically targeted salt 'machinename' state.highstate. When I run states manually, I usually use blob targetting salt 'wplatnick*' state.highstate. It happens with both. |
@basepi Any further thoughts on this one? |
@TempSpace does this only happen when using the salt-cloud execution module within salt? |
No, it happens via salt on non-salt-cloud machines as well. |
The weird thing is that as far as I know, this is only happening to you. I wonder if there's some sort of firewall issue or similar that is causing these issues. |
After one of these exits prematurely, are the minions still reachable by the master? Can you continue to ping them? |
@WillPlatnick There have been a number of changes in recent weeks on timeouts. Have you by chance been able to test any of the release candidates of 2014-1? This could very well be resolved. |
Hi again @WillPlatnick. Given the release of 2014.1, the fact that we can't reproduce this, we don't have other users reporting it and that we haven't heard back from you, I'm going to close this issue. If it's still affecting you on recent code, we'll certainly re-open. Please don't hesitate to comment here if we're closing an issue that shouldn't be. Thanks! |
Starting in 0.16.3, I've started having issues with running highstates via the command salt 'hostname' state.highstate when there's a lot of work to be done. The highstate runs, but salt terminates and goes back to the command line after only 1-3 minutes with an exit code of 0. If I lookup all the jobs, I can see it runs and salt will do a saltutil.find_job a couple times before salt quits prematurely and if I lookup the highstate jid, it ran fine. This is not happening every single time, but in my tests today, it was happening the majority of time.
Example Tests:
Test #1 - provision fresh VM with salt-cloud, highstate returned as it should after 3m1s
Test #2 - provision fresh VM with salt-cloud, salt exits with no data returned after 1m52s
This is the versions output for the salt master and minion
Salt: 0.16.3
Python: 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
Jinja2: 2.5.5
M2Crypto: 0.20.1
msgpack-python: 0.1.10
msgpack-pure: Not Installed
pycrypto: 2.1.0
PyYAML: 3.10
PyZMQ: 13.1.0
ZMQ: 3.2.3
The text was updated successfully, but these errors were encountered: