Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using batch-mode with salt.state in orchestration runner considers all minions to have failed #39169

Closed
blueyed opened this issue Feb 3, 2017 · 12 comments
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P4 Priority 4 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around State-Module
Milestone

Comments

@blueyed
Copy link
Contributor

blueyed commented Feb 3, 2017

Description of Issue/Question

When using batch with salt.state successful/all runs will be handled like failures.

This is caused because of 'retcode': 0 in m_ret being passed to salt.utils.check_state_result (in

salt/salt/states/saltmod.py

Lines 302 to 306 in 0a8c7e4

m_ret = mdata['ret']
except KeyError:
m_state = False
if m_state:
m_state = salt.utils.check_state_result(m_ret)
).
Since it's not a dict, it will trigger ret to be False (in

salt/salt/utils/__init__.py

Lines 1841 to 1842 in 0a8c7e4

if not recurse and not isinstance(state_result, dict):
ret = False
).

Using m_ret.pop('retcode') before fixes it.

I could imagine also for check_state_result to look at retcode (if any), and use it to return based on that (i.e. 0 means success).

salt.function is not affected by this. It looks at retcode (

salt/salt/states/saltmod.py

Lines 474 to 476 in 0a8c7e4

if mdata.get('retcode'):
func_ret['result'] = False
fail.add(minion)
), and does not use check_state_result.

Setup

stack.sls:

test:
  salt.state:
    - tgt: 'G@environment:test and …'
    - tgt_type: compound
    - sls:
      - docker
    - batch: 10%
salt-run state.orchestrate orchestration.stack saltenv=test

Versions Report

Salt Version:
           Salt: 2016.11.2
 
Dependency Versions:
           cffi: 1.7.0
       cherrypy: 3.5.0
       dateutil: 2.5.3
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: 0.22.0
        libnacl: Not Installed
       M2Crypto: 0.21.1
           Mako: 1.0.3
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: 0.22.0
         Python: 2.7.12 (default, Nov 19 2016, 06:48:10)
   python-gnupg: 0.3.8
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.3
            ZMQ: 4.1.4
 
System Versions:
           dist: Ubuntu 16.04 xenial
        machine: x86_64
        release: 4.4.0-59-generic
         system: Linux
        version: Ubuntu 16.04 xenial
@Ch3LL
Copy link
Contributor

Ch3LL commented Feb 6, 2017

@blueyed are you running into the same error as this issue #34922 ?

@Ch3LL Ch3LL added the info-needed waiting for more info label Feb 6, 2017
@Ch3LL Ch3LL added this to the Blocked milestone Feb 6, 2017
@blueyed
Copy link
Contributor Author

blueyed commented Feb 6, 2017

@Ch3LL
Not really, since in my case minions are being matched.
But I think that #34922 (comment) sounds relevant.

@Ch3LL
Copy link
Contributor

Ch3LL commented Feb 8, 2017

@blueyed thanks for clarifying. Seems you have a patch in mind, want to submit that as a PR?

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P4 Priority 4 State-Module Core relates to code central or existential to Salt and removed info-needed waiting for more info labels Feb 8, 2017
@blueyed
Copy link
Contributor Author

blueyed commented Feb 8, 2017

@Ch3LL
No, I have no fix in mind.
Based on my initial comment it seems like salt.state and salt.function have diverged, but I've not investigated further.

@s-schweer
Copy link

I run into this problem as well

@smarsching
Copy link
Contributor

I am also affected by this problem. The question regarding this issue is how the API for states is defined:

  1. If the ['return']['data'] section of the state's return value is only supposed to contain dictionaries as values, the check_state_result function is correct and the code adding the retcode has to be removed.

  2. If the ['return']['data'] section of the state's return value is only supposed to contain dictionaries as values, except for the retcode key, the check_state_result function needs to be changed.

  3. If the ['return']['data'] section of the state's return value may contain non-dict values, the corresponding check has to be removed from check_state_result.

I suspect that answer two is the correct one, but I do not have a sufficient understanding of Salt's internals to be sure. Maybe one of the core developers can comment. I would be happy to provide a pull request with the necessary changes once this question is answered.

@smarsching
Copy link
Contributor

smarsching commented Feb 23, 2017

I found the relevant change in check_state_result that is causing this issue. It was introduced with f96c8f9 and causes a return value of False when the dictionary passed to check_state_result contains non-dict values on the first level.

@isbm made this change, so maybe he can shed light on why this change was introduced and how we could fix the issues resulting from this change.

@isbm
Copy link
Contributor

isbm commented Feb 23, 2017

@smarsching You are asking me to un-swap a magnetic tar tape from my brain's archives in order to remember what the heck I have been done a year ago... 😆 OK, the PR was here: #31745 and it self-explanatory there. I guess, this needs to be improved a bit.

@smarsching
Copy link
Contributor

@isbm Yes, I know this was a mean question. I also hate it when I am confronted with some code I wrote a long time ago. 😉

Thank you very much for pointing me to #31745. Now I see that the commit that I found was actually part of a larger changeset. I will try to figure out how the problem with state.show_highstate can be fixed without causing problems with other functions (for example states.saltmod.runner in my case).

@smarsching
Copy link
Contributor

After digging deeper into the code, I came to the following conclusion:

The retcode should not be inside the dictionary stored under the data key, but should actually be on the same level as the data (this is the same level that is also used for the outputter). Most parts of the code (modules, states, etc.) seem to adhere to this structure.

The only code that violates this structure is the state (orchestrate) runner. It stores the retcode inside data. Actually, this is the place where salt-run expects it, but as none of the other runners seems to include a retcode. Therefore it should be safe to simply change it in both places. With this change, the return value of the state runner will have the same structure as all other return data and check_state_result should work correctly.

I will try to put a patch together that implements these fixes.

smarsching added a commit to smarsching/salt that referenced this issue Feb 24, 2017
Previously, the return code of the runner (if any) was supplied in
ret['data']['retcode']. This was problematic if ret['data'] was later processed
by check_state_result. With this change, runners return the optional return
code in ret['retcode'], like the other code (modules, etc.) already did before.
@gtmanfred
Copy link
Contributor

Can yall test if this PR fixes this issue as well?

#40905

Thanks,
Daniel

@gtmanfred gtmanfred added the fixed-pls-verify fix is linked, bug author to confirm fix label Apr 28, 2017
@Ch3LL
Copy link
Contributor

Ch3LL commented May 2, 2017

Closing since its fixed and haven't heard back from anyone. Please comment and we can re-open if this changes

grahamhayes added a commit to grahamhayes/salt that referenced this issue May 23, 2017
This ensures we do not introduce a regression which breaks
the usage of the `batch` option when using states in orchestration
runners.

Related to saltstack#40635 and saltstack#39169
fjudith added a commit to fjudith/saltstack-kubernetes that referenced this issue Apr 12, 2020
- Use `batch` parameter to process targets sequentially [reference](saltstack/salt#39169)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P4 Priority 4 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around State-Module
Projects
None yet
Development

No branches or pull requests

6 participants