Added the saltmod.parallel_runners state. #39670

smarsching · 2017-02-26T16:48:28Z

What does this PR do?

This new state function is intended for use with the orchestrate runner. It is used in a way very similar to saltmod.runner, except that it executes multiple runners in parallel.

What issues does this PR fix or reference?

This PR adds a general feature and is not particularly focused on fixing a single issue. However, I believe that at least some of the use cases discussed in #32488 and #32956 are covered by this new feature.

The use case that I wrote it for is orchestrating actions where some can run in parallel and some have to run after each other. In my case, I want to run a system upgrade on all VMs hosted on a number of VM hosts. As running a system upgrade generates a significant I/O load, I only want to upgrade one VM per host at a time. However, upgrades on different VM hosts shall run in parallel.

With the existing features of the orchestrate runner, this was not possible (except for launching several instances of the orchestrate runner from the shell, of course). With the new feature, added by this PR, I can now have an SLS file that triggers the execution of a separate runner for each VM host and waits until all of these runners have finished.

The state blocks, until all runners triggered by it have finished. Therefore, it is possible to run certain actions in parallel and then continuing with another action, which depends on all of these actions having finished. Therefore, I believe that with the addition of this new feature, orchestration scenarios with very complex dependencies between actions can be covered correctly.

Previous Behavior

The saltmod.parallel_runners state function did not exist.

New Behavior

The saltmod.parallel_runners state function has been added.

Tests written?

No

This new state is intended for use with the orchestrate runner. It is used in a way very similar to saltmod.runner, except that it executes multiple runners in parallel.

ghost · 2017-02-26T16:48:31Z

@smarsching, thanks for your PR! By analyzing the history of the files in this pull request, we identified @whiteinge, @terminalmage and @thatch45 to be potential reviewers.

thatch45

This is the right kind of crazy, I like it. It looks like you have debate about using threads or processes but landed on threads. Should the documentation be updated to reflect the use of threads?

smarsching · 2017-02-27T19:03:12Z

I actually gave the question of processes vs threads some thought.

Initally, I was using processes, but I hit a wall because I did not manage to make the function that does the actual per-process work serializable. I am pretty sure this was only to my lack of expertise in Python (I am mainly working with C++ and Java). I think due to some clever aliasing going on in Salt, the serialization code did not accept the function as a top-level function, even though it was actually defined at the top-level of the code file.

Anyway, I continued with threads and figured out that threads are probably the better choice anyway:

Threads are more lightweight than processes (which could be an issue when running many actions in parallel).
The GIL should not be an issue because the threads are going to spent most of the time waiting on I/O anyway.
There is only a single process, so pressing Ctrl-C will actually kill the whole thing (this why I am using daemon threads).

Of course I am open to a solution using processes if you see any benefits in it. Adapting the code should not be very difficult (only the _parallel_map function would need some changes). However, I would need some help with figuring out how to write a function that is serializable and thus actually accepted by the multiprocessing API.

The documentation erroneously used the word process in one place where thread would actually have been correct. This commit fixes this issue.

smarsching · 2017-02-27T20:32:53Z

Sorry, I totally missed your point about the documentation mentioning processes (I only saw the places where the term "thread" was already used). I now added a commit that ensures that the term "thread" is used consistently everywhere in the documentation.

whiteinge

Nice addition. This is a generic way to fulfil a few similar asks. Thanks for cross-linking them in the description. One other is probably #33390.

Few notes above.

Also the following Orchestrate run reports errors despite successful sub-orchestrate runs. I'm not sure why at quick glance. You can see the full states I'm using in the debug output but let me know if you have any other questions.

% salt-run -c $VIRTUAL_ENV/etc/salt -l debug state.orch prun
[...snip...]
[DEBUG   ] Rendered data from file: /Users/shouse/tmp/venvs/salt/var/cache/salt/master/files/base/prun.sls:
parallel-state:
   salt.parallel_runners:
     - runners:
       - name: state.orch
         kwarg:
           mods: gndn
       - name: state.orch
         kwarg:
           mods: gndn

[...snip...]
[INFO    ] Executing state salt.parallel_runners for [parallel-state]
[DEBUG   ] Unable to fire args event due to missing __orchestration_jid__
[...snip...]
[DEBUG   ] Rendered data from file: /Users/shouse/tmp/venvs/salt/var/cache/salt/master/files/base/gndn.sls:
gndn:
  test.succeed_with_changes

[...snip...]
[INFO    ] Completed state [parallel-state] at time 11:24:07.739789 duration_in_ms=3465.108
[DEBUG   ] Sending event: tag = salt/run/20170302112401522902/ret; data = {'fun_args': ['prun'], 'jid': '20170302112401522902', 'return': {'outputter': 'highstate', 'data': {'burke.lan_master': {'salt_|-parallel-state_|-parallel-state_|-parallel_runners': {'comment': "Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.\nRunner 1 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.", 'name': 'parallel-state', '__orchestration__': True, 'start_time': '11:24:04.274681', 'result': False, 'duration': 3465.108, '__run_num__': 0, '__sls__': u'prun', 'changes': {'[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}, '[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'parallel-state'}}, 'retcode': 1}}, 'success': False, '_stamp': '2017-03-02T18:24:07.740730', 'user': 'shouse', 'fun': 'runner.state.orch'}
burke.lan_master:
----------
          ID: parallel-state
    Function: salt.parallel_runners
      Result: False
     Comment: Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.
              Runner 1 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.
     Started: 11:24:04.274681
    Duration: 3465.108 ms
     Changes:
              ----------
              [0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]:
                  ----------
                  new:
                      Something pretended to change
                  old:
                      Unchanged
              [1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]:
                  ----------
                  new:
                      Something pretended to change
                  old:
                      Unchanged

Summary for burke.lan_master
------------
Succeeded: 0 (changed=1)
Failed:    1
------------
Total states run:     1
Total run time:   3.465 s
retcode:
    1
[DEBUG   ] LazyLoaded local_cache.prep_jid
[INFO    ] Runner completed: 20170302112401522902
[DEBUG   ] Runner return: {'outputter': 'highstate', 'data': {'burke.lan_master': {'salt_|-parallel-state_|-parallel-state_|-parallel_runners': {'comment': u"Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.\nRunner 1was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}},'retcode': 0}}.", 'name': 'parallel-state', '__orchestration__': True, 'start_time': '11:24:04.274681', 'result': False, 'duration': u'3465.108 ms', '__run_num__': 0, '__sls__': u'prun', 'changes': {'[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}, '[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'parallel-state'}}, 'retcode': 1}}

whiteinge · 2017-03-02T17:55:40Z

salt/states/saltmod.py

+    .. code-block:: yaml
+
+        parallel-state:
+           saltext.parallel-runner:


Should be salt.parallel_runners

whiteinge · 2017-03-02T18:08:54Z

salt/states/saltmod.py

+             - runners:
+               - name: state.orchestrate
+                 kwarg:
+                   mods: orchestrate_state_1


The [{"name": "...", "kwarg": {...}] makes perfect sense as an easy-to-write/easy-to-use data structure...but it's not canonical Salt. I can't think of another place in Salt that has something similar so I think it's worth avoiding solely for that reason. Other suggestions welcome, but the following suggestion matches syntaxes found elsewhere in Salt. You can use repack_dictlist to convert the list of dicts to a dict.

- runners: - name: state.orchestrate - kwarg: mods: orchestrate_state_1

whiteinge · 2017-03-02T18:14:38Z

salt/states/saltmod.py

+               - name: state.orchestrate
+                 kwarg:
+                   mods: orchestrate_state_1
+               - name: state.orcestrate


Slight typo: missing the "h".

Related: the error message for fat-fingered function names is a large traceback. I think the lazy-loader has an easy way for us to check those in advance. @cachedout is that correct?

[ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/Users/shouse/src/salt/salt/salt/state.py", line 1812, in call **cdata['kwargs']) File "/Users/shouse/src/salt/salt/salt/loader.py", line 1724, in wrapper return f(*args, **kwargs) File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 784, in parallel_runners outputs = _parallel_map(call_runner, runners) File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 102, in _parallel_map six.reraise(exc_type, exc_value, exc_traceback) File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 90, in run_thread outputs[index] = func(inputs[index]) File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 782, in call_runner **(runner_config['kwarg'])) File "/Users/shouse/src/salt/salt/salt/modules/saltutil.py", line 1312, in runner full_return=full_return) File "/Users/shouse/src/salt/salt/salt/runner.py", line 144, in cmd full_return) File "/Users/shouse/src/salt/salt/salt/client/mixins.py", line 226, in cmd self.functions[fun], arglist, pub_data File "/Users/shouse/src/salt/salt/salt/loader.py", line 1095, in __getitem__ func = super(LazyLoader, self).__getitem__(item) File "/Users/shouse/src/salt/salt/salt/utils/lazy.py", line 101, in __getitem__ raise KeyError(key) KeyError: 'state.orcestrate'

whiteinge · 2017-03-02T18:18:52Z

salt/states/saltmod.py

+                                           __orchestration_jid__=jid,
+                                           __env__=__env__,
+                                           full_return=True,
+                                           **(runner_config['kwarg']))


Should account for runners that don't require args. Perhaps runner_config.get('kwarg', {}).

whiteinge · 2017-03-02T18:23:09Z

salt/states/saltmod.py

+        lambda x, y: x and y,
+        [not ('success' in out and not out['success']) for out in outputs],
+        True)
+


It looks like this is intended to kick off any Runners in parallel and not just Orchestrate. A good way to detect if it's an orchestrate run or not is to test if each output contains an out key that has the value highstate. If so, it's an orchestrate or state run and will have the usual state return dictionary. If not, then you can short-circuit looking for those fields and just return the outputs verbatim.

whiteinge · 2017-03-02T18:30:17Z

salt/states/saltmod.py

+
+    ret = {
+        'name': name,
+        'result': success,


The result field in the highstate output and the success field in the job output are separate things and should be handled separately. success indicates whether there were any internal Salt failures, whereas the result field indicates whether any individual state functions failed.

cachedout · 2017-03-04T16:13:18Z

@smarsching Are you able to review the latest review feedback from @whiteinge ?

The code in saltmod.parallel_runners would fail if the (optional) kwarg argument was missing. This is fixed by using an empty dictionary for kwarg by default.

smarsching · 2017-03-05T21:52:57Z

@whiteinge Thank you for your very valuable comments. Sorry it took me so long to reply. I've simply been quite busy with other stuff.

The documentation issues that you found are of course simple typos and I fixed them. The problem with the runner reporting an error, even though the sub-runners were successful, is actually due to a bug that I fixed with PR #39641. However, this PR was for the 2016.3 branch and according to my knowledge has not been forward-ported to the develop branch yet. Once this patch is applied, the parallel_runners state function should not report failure any longer.

Regarding the format of the arguments, I fully agree that we should aim for a style that is as close to the style used in other places as reasonyble possible. In fact, the consistency in style was one of the reasons that made me choose SaltStack over one of its competitors. 😉

However, I think that the format that you suggest will not work because of the possible duplicate keys. Consider your example:

             - runners:
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_1

This works fine as long as there is only a single runner (which kind of defeats the purpose of the new parallel_runners function). When we extend it to two runners running in parallel, it does not work any longer:

             - runners:
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_1
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_2

If we wanted to, we could use the order of the key-value pairs and start a new dict everytime we see the name key, but this looks very messy to me. A cleaner format could look like this:

             - runners:
                 orch1:
                   - name: state.orchestrate
                   - kwarg:
                      mods: orchestrate_state_1
                 orch2:
                 - name: state.orchestrate
                 - kwarg:
                      mods: orchestrate_state_2

The downside of this format is that it is kind of verbose because each of the runner instances needs a different name. On the other hand, this name could serve as an identifier in an improved return value format (in particular regarding changes). One could use the key in place of the name parameter when it is not specified explicitly. This would then be consistent with other places in Salt, but would hardly be useful here because the most common case is calling the same runner (state.orchestrate) with different parameters.

I also thought about limiting the use to the orchestrate runner, which would simplify the format to something like this:

             - mods:
                 - orchestrate_state_1
                 - orchestrate_state_2

Optionally, the states could take additional key-value parameters for passing arguments (like pillar data):

             - mods:
                 - orchestrate_state_1
                     - pillar:
                         mykey: myvalue
                 - orchestrate_state_2

However, for some scenarios it might be interesting to parallize other runners and not only the orchestrate runner. For this reason, I would rather have a generic solution and then maybe build a simplified version for the orchestrate runner on top of it.

In summary, I would prefer the slightly more verbose format, but I would be happy to hear your opinion.

Regarding the other issues that you mention I am not entirely sure how to proceed. I am quite new to SaltStack and do not fully understand the API for runners yet, in particular when it comes to the format of their return values.

I started of by basically copying the logic from saltmod.runner. The output of saltmod.runner seems to differ from the output produced by most states, so I made some adjustments (e.g. regarding the changes key). However, after having used the parallel_runners state function for a few days, I am not happy with the way it handles output: Looking at the changes dictionary is confusing at best and sometimes changes are even missing because the underlying modules did not return the changes in the expected old / new format.

I am bit confused by your statement that the success key is used for internal errors while the result is used for the state’s or module’s result. From looking at the code that usually sets the success field, I got the same impression. However, the saltmod.runner function uses the success field of the runner’s output to set its result field. So there seem to be some inconsistencies there...

Is there any documentation regarding the output format for states and runners? From your comments, I understand that the format also seems to depend on whether the runner is the orchestrate runner (or whether a state is a highstate), but I could not find a specification telling me when to expect which format. If there is no such specification yet, maybe you can help me with a few questions:

What is the expected structure for the return value of a state function?
What is the expected structure for the return value of a runner?
Should the saltmod.* state functions adhere to the same structure as the regular state functions or are they different because they are only expected to be used in the SLS files used for the orchestrate runner?
Can I expect a specific sub-structure for the return value of the orchestrate runner (in particular regarding changes)? If so, is this limited to the orchestrate runner, or is there a general guideline for when to expect each structure (e.g. highstate == False or missing means format A and highstate == True means format B)?

This information would really help me with improving the saltmod.parallel_runners function and making its output more readable.

whiteinge · 2017-03-06T20:01:56Z

the format that you suggest will not work because of the possible duplicate keys.

Great point. Your dict-based suggestion (repeated below) feels the most "Salty" to me. The key is unnecessary, like you pointed out, but your suggestion to use that to label the output for each is a great idea.

             - runners:
                 orch1:
                   - name: state.orchestrate
                   - kwarg:
                      mods: orchestrate_state_1

What is the expected structure for the return value of a state function?

You can only rely on the top-level keys (changes, result, comment, changes). The contents of changes are free-form and will not necessarily have old/new keys.

That said, I'd suggest avoiding trying to infer too much from the sub-returns and rather just returning the result verbatim as nested data structures. If you go with the config syntax above you could use that dictionary key instead of the minion name like salt.state does for each return, e.g.: {...snip parallel_runners output..., "changes": {"ret": {"orch1": {...verbatim output from orch1...}}}}. (More on this below.)

What is the expected structure for the return value of a runner?

Runner modules, like execution modules, do not have a defined return structure. They can return whatever is appropriate for the data they're trying to show. If a module is returning the result of a state run (including orchestrate) then the return should be consistent with what the highstate outputter is expecting.

Should the saltmod.* state functions adhere to the same structure as the regular state functions or are they different because they are only expected to be used in the SLS files used for the orchestrate runner?

Yes. This output should be able to be run through the highstate outputter successfully. Expanding on my comment above, I'd suggest something like:

{
    "...parallel_runners return...": {
        "changes": {"ret": {"...name from dict...": {...verbatim return...}, ...}}, 
        # Exception: if all returns are state returns (`out` == `highstate`), then
        # `changes` should be an empty dict `{}` if all the sub-state runs
        # also have empty `changes` dicts.

        "result": ...True if `exit_code` is non-zero for all returns
    }
    "success": ...True if no Salt exceptions...,
    "exit_code": ...0 if all `result`fields are True otherwise 1...
}

the saltmod.runner function uses the success field of the runner’s output to set its result field

That sounds like salt.state is doing it wrong but I could be mistaken. I would expect result to be an all(...other result fields...) and success to be an all(...other success fields...).

smarsching · 2017-03-06T20:57:38Z

Great, thanks a lot for your answers. I think this gives me the information that I need for improving the function. I will try to implement the changes and get back to you as soon as I have something to be reviewed. However, it might be a few days before I find the time to look into this.

The name parameter in a call to dict.get(...) was accidentally wrapped in brackets, leading to a TypeError ("unhashable type: 'list'").

smarsching · 2017-03-13T09:39:26Z

I did not get to work on the discussed issues yet because I first had to fix a race condition that was exposed by the salt.parallel_runners function and caused a lot of trouble for me. This happened when passing individual pillar data to parallel tasks (through the pillar argument of the state.orchestrate function).

PR #39948 provides the fix for this race condition. Maybe you could review this PR (or find someone who might be able to review it). I think it makes sense to merge that PR before this PR because with this race condition present the parallel_runners function will have undefined behavior in certain scenarios.

The configuration format for specifying the list of runners has been changed so that it matches the format used in other places. The merging of outputs from the runners has been improved so that the outputs are correctly passed on regardless of the format used by the runner.

smarsching · 2017-03-15T16:41:27Z

@whiteinge I implemented the changes that we discussed. The configuration format now uses a key for identifying each runner and below this key the runner configuration is expected as a list of key-value pairs.

The merging of the output now works like this:

If the outputter key is present in the return value and is equal to highstate for all runners, the code recursively searches the data tree for changes keys. When such a key is found, and has an out sub-key with a value of highstate, the value associated with the ret sub-key is used. If there is no out sub-key or the value is not highstate, the value associated with the changes key is copied as-is.

If any of the runners' return values does not match the highstate format, the returned changes have a single ret key, below which there is a key-value pair for each runner, where the key is the ID used in the configuration and the value is the return value of the runner.

I also made a change to how failed runners are handled: For each failed runner, the complete output of the runner is included in the comment and formatted using the nested outputter. This makes it much easier to identify the cause of a problem.

whiteinge

Awesome work. Solid code and it passed a few local smoke-tests.

whiteinge · 2017-03-15T18:36:30Z

Thanks for those changes. Very cool!

There's a couple related PRs referenced above and it would be nice to get the highstate outputter working for the nested highstate structures like Orchestrate does (but I don't know how to do that offhand).

IMO, this is ready for merge and we can fine-tune over time.

terminalmage

It looks like, due to the nature of multiple jobs being spawned by this, we won't be able to get the jid into the return like in other states. This information is just there as metadata though, nothing relies on it. So, I think we're OK here.

cachedout · 2017-03-22T18:26:13Z

Go Go Jenkins!

There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches.

There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json module has been modified to takes care of unwrapping objects that are proxied using the ThreadLocalProxy. The salt.utils.msgpack module has been added and basically provides the same functions as the salt.utils.json module, but for msgpack. Like the json module, it takes care of unwrapping proxies.

There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json and salt.utils.msgpack modules have been modified to take care of unwrapping objects that are proxied using the ThreadLocalProxy.

Added the saltmod.parallel_runners state.

068afc3

This new state is intended for use with the orchestrate runner. It is used in a way very similar to saltmod.runner, except that it executes multiple runners in parallel.

cachedout requested review from thatch45, whiteinge and terminalmage February 27, 2017 17:03

thatch45 suggested changes Feb 27, 2017

View reviewed changes

Fixed the documentation (thread vs. process).

1582e92

The documentation erroneously used the word process in one place where thread would actually have been correct. This commit fixes this issue.

cachedout approved these changes Feb 28, 2017

View reviewed changes

thatch45 approved these changes Mar 2, 2017

View reviewed changes

whiteinge suggested changes Mar 2, 2017

View reviewed changes

smarsching added 2 commits March 5, 2017 21:11

Fix two typos in docs for salt.parallel_runners.

c3b9035

Allow for a missing kwarg parameter.

7155bdf

The code in saltmod.parallel_runners would fail if the (optional) kwarg argument was missing. This is fixed by using an empty dictionary for kwarg by default.

Fixed incorrect use of list as dict key.

536093b

The name parameter in a call to dict.get(...) was accidentally wrapped in brackets, leading to a TypeError ("unhashable type: 'list'").

smarsching mentioned this pull request Mar 10, 2017

Fix race condition in Salt loader. #39948

Closed

whiteinge approved these changes Mar 15, 2017

View reviewed changes

Newlines for lint compliance

924d039

terminalmage approved these changes Mar 21, 2017

View reviewed changes

whiteinge mentioned this pull request Mar 29, 2017

Allow orchestrate to call orchestration files #33390

Closed

smarsching mentioned this pull request Jan 30, 2018

Fix race condition in Salt loader. #45782

Closed

smarsching mentioned this pull request Nov 27, 2018

Fix race condition in Salt loader. #50655

Merged

Akm0d mentioned this pull request Nov 8, 2019

Master thread safe loader #54878

Closed

sagetherage mentioned this pull request Jun 23, 2020

[BUG] thread safety in the loader namespacing issues #57765

Closed

dwoz mentioned this pull request Sep 4, 2020

[BUG] Thread safety of loader #58369

Closed

smarsching deleted the feature-parallel-runners branch December 9, 2022 23:16

max-arnold mentioned this pull request Feb 20, 2024

[BUG] salt.parallel_runners is broken (missing LoaderContext) #66022

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the saltmod.parallel_runners state. #39670

Added the saltmod.parallel_runners state. #39670

smarsching commented Feb 26, 2017

ghost commented Feb 26, 2017

thatch45 left a comment

smarsching commented Feb 27, 2017

smarsching commented Feb 27, 2017

whiteinge left a comment

whiteinge Mar 2, 2017

whiteinge Mar 2, 2017

whiteinge Mar 2, 2017

whiteinge Mar 2, 2017

whiteinge Mar 2, 2017

whiteinge Mar 2, 2017

cachedout commented Mar 4, 2017

smarsching commented Mar 5, 2017

whiteinge commented Mar 6, 2017 •

edited

Loading

smarsching commented Mar 6, 2017

smarsching commented Mar 13, 2017

smarsching commented Mar 15, 2017

whiteinge left a comment

whiteinge commented Mar 15, 2017

terminalmage left a comment

cachedout commented Mar 22, 2017

Added the saltmod.parallel_runners state. #39670

Added the saltmod.parallel_runners state. #39670

Conversation

smarsching commented Feb 26, 2017

What does this PR do?

What issues does this PR fix or reference?

Previous Behavior

New Behavior

Tests written?

ghost commented Feb 26, 2017

thatch45 left a comment

Choose a reason for hiding this comment

smarsching commented Feb 27, 2017

smarsching commented Feb 27, 2017

whiteinge left a comment

Choose a reason for hiding this comment

whiteinge Mar 2, 2017

Choose a reason for hiding this comment

whiteinge Mar 2, 2017

Choose a reason for hiding this comment

whiteinge Mar 2, 2017

Choose a reason for hiding this comment

whiteinge Mar 2, 2017

Choose a reason for hiding this comment

whiteinge Mar 2, 2017

Choose a reason for hiding this comment

whiteinge Mar 2, 2017

Choose a reason for hiding this comment

cachedout commented Mar 4, 2017

smarsching commented Mar 5, 2017

whiteinge commented Mar 6, 2017 • edited Loading

smarsching commented Mar 6, 2017

smarsching commented Mar 13, 2017

smarsching commented Mar 15, 2017

whiteinge left a comment

Choose a reason for hiding this comment

whiteinge commented Mar 15, 2017

terminalmage left a comment

Choose a reason for hiding this comment

cachedout commented Mar 22, 2017

whiteinge commented Mar 6, 2017 •

edited

Loading