-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the saltmod.parallel_runners state. #39670
Added the saltmod.parallel_runners state. #39670
Conversation
This new state is intended for use with the orchestrate runner. It is used in a way very similar to saltmod.runner, except that it executes multiple runners in parallel.
@smarsching, thanks for your PR! By analyzing the history of the files in this pull request, we identified @whiteinge, @terminalmage and @thatch45 to be potential reviewers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the right kind of crazy, I like it. It looks like you have debate about using threads or processes but landed on threads. Should the documentation be updated to reflect the use of threads?
I actually gave the question of processes vs threads some thought. Initally, I was using processes, but I hit a wall because I did not manage to make the function that does the actual per-process work serializable. I am pretty sure this was only to my lack of expertise in Python (I am mainly working with C++ and Java). I think due to some clever aliasing going on in Salt, the serialization code did not accept the function as a top-level function, even though it was actually defined at the top-level of the code file. Anyway, I continued with threads and figured out that threads are probably the better choice anyway:
Of course I am open to a solution using processes if you see any benefits in it. Adapting the code should not be very difficult (only the |
The documentation erroneously used the word process in one place where thread would actually have been correct. This commit fixes this issue.
Sorry, I totally missed your point about the documentation mentioning processes (I only saw the places where the term "thread" was already used). I now added a commit that ensures that the term "thread" is used consistently everywhere in the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition. This is a generic way to fulfil a few similar asks. Thanks for cross-linking them in the description. One other is probably #33390.
Few notes above.
Also the following Orchestrate run reports errors despite successful sub-orchestrate runs. I'm not sure why at quick glance. You can see the full states I'm using in the debug output but let me know if you have any other questions.
% salt-run -c $VIRTUAL_ENV/etc/salt -l debug state.orch prun
[...snip...]
[DEBUG ] Rendered data from file: /Users/shouse/tmp/venvs/salt/var/cache/salt/master/files/base/prun.sls:
parallel-state:
salt.parallel_runners:
- runners:
- name: state.orch
kwarg:
mods: gndn
- name: state.orch
kwarg:
mods: gndn
[...snip...]
[INFO ] Executing state salt.parallel_runners for [parallel-state]
[DEBUG ] Unable to fire args event due to missing __orchestration_jid__
[...snip...]
[DEBUG ] Rendered data from file: /Users/shouse/tmp/venvs/salt/var/cache/salt/master/files/base/gndn.sls:
gndn:
test.succeed_with_changes
[...snip...]
[INFO ] Completed state [parallel-state] at time 11:24:07.739789 duration_in_ms=3465.108
[DEBUG ] Sending event: tag = salt/run/20170302112401522902/ret; data = {'fun_args': ['prun'], 'jid': '20170302112401522902', 'return': {'outputter': 'highstate', 'data': {'burke.lan_master': {'salt_|-parallel-state_|-parallel-state_|-parallel_runners': {'comment': "Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.\nRunner 1 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.", 'name': 'parallel-state', '__orchestration__': True, 'start_time': '11:24:04.274681', 'result': False, 'duration': 3465.108, '__run_num__': 0, '__sls__': u'prun', 'changes': {'[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}, '[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'parallel-state'}}, 'retcode': 1}}, 'success': False, '_stamp': '2017-03-02T18:24:07.740730', 'user': 'shouse', 'fun': 'runner.state.orch'}
burke.lan_master:
----------
ID: parallel-state
Function: salt.parallel_runners
Result: False
Comment: Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.
Runner 1 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.
Started: 11:24:04.274681
Duration: 3465.108 ms
Changes:
----------
[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]:
----------
new:
Something pretended to change
old:
Unchanged
[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]:
----------
new:
Something pretended to change
old:
Unchanged
Summary for burke.lan_master
------------
Succeeded: 0 (changed=1)
Failed: 1
------------
Total states run: 1
Total run time: 3.465 s
retcode:
1
[DEBUG ] LazyLoaded local_cache.prep_jid
[INFO ] Runner completed: 20170302112401522902
[DEBUG ] Runner return: {'outputter': 'highstate', 'data': {'burke.lan_master': {'salt_|-parallel-state_|-parallel-state_|-parallel_runners': {'comment': u"Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.\nRunner 1was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}},'retcode': 0}}.", 'name': 'parallel-state', '__orchestration__': True, 'start_time': '11:24:04.274681', 'result': False, 'duration': u'3465.108 ms', '__run_num__': 0, '__sls__': u'prun', 'changes': {'[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}, '[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'parallel-state'}}, 'retcode': 1}}
salt/states/saltmod.py
Outdated
.. code-block:: yaml | ||
|
||
parallel-state: | ||
saltext.parallel-runner: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be salt.parallel_runners
salt/states/saltmod.py
Outdated
- runners: | ||
- name: state.orchestrate | ||
kwarg: | ||
mods: orchestrate_state_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [{"name": "...", "kwarg": {...}]
makes perfect sense as an easy-to-write/easy-to-use data structure...but it's not canonical Salt. I can't think of another place in Salt that has something similar so I think it's worth avoiding solely for that reason. Other suggestions welcome, but the following suggestion matches syntaxes found elsewhere in Salt. You can use repack_dictlist to convert the list of dicts to a dict.
- runners:
- name: state.orchestrate
- kwarg:
mods: orchestrate_state_1
salt/states/saltmod.py
Outdated
- name: state.orchestrate | ||
kwarg: | ||
mods: orchestrate_state_1 | ||
- name: state.orcestrate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight typo: missing the "h".
Related: the error message for fat-fingered function names is a large traceback. I think the lazy-loader has an easy way for us to check those in advance. @cachedout is that correct?
[ERROR ] An exception occurred in this state: Traceback (most recent call last):
File "/Users/shouse/src/salt/salt/salt/state.py", line 1812, in call
**cdata['kwargs'])
File "/Users/shouse/src/salt/salt/salt/loader.py", line 1724, in wrapper
return f(*args, **kwargs)
File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 784, in parallel_runners
outputs = _parallel_map(call_runner, runners)
File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 102, in _parallel_map
six.reraise(exc_type, exc_value, exc_traceback)
File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 90, in run_thread
outputs[index] = func(inputs[index])
File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 782, in call_runner
**(runner_config['kwarg']))
File "/Users/shouse/src/salt/salt/salt/modules/saltutil.py", line 1312, in runner
full_return=full_return)
File "/Users/shouse/src/salt/salt/salt/runner.py", line 144, in cmd
full_return)
File "/Users/shouse/src/salt/salt/salt/client/mixins.py", line 226, in cmd
self.functions[fun], arglist, pub_data
File "/Users/shouse/src/salt/salt/salt/loader.py", line 1095, in __getitem__
func = super(LazyLoader, self).__getitem__(item)
File "/Users/shouse/src/salt/salt/salt/utils/lazy.py", line 101, in __getitem__
raise KeyError(key)
KeyError: 'state.orcestrate'
salt/states/saltmod.py
Outdated
__orchestration_jid__=jid, | ||
__env__=__env__, | ||
full_return=True, | ||
**(runner_config['kwarg'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should account for runners that don't require args. Perhaps runner_config.get('kwarg', {})
.
salt/states/saltmod.py
Outdated
lambda x, y: x and y, | ||
[not ('success' in out and not out['success']) for out in outputs], | ||
True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is intended to kick off any Runners in parallel and not just Orchestrate. A good way to detect if it's an orchestrate run or not is to test if each output contains an out
key that has the value highstate
. If so, it's an orchestrate or state run and will have the usual state return dictionary. If not, then you can short-circuit looking for those fields and just return the outputs verbatim.
salt/states/saltmod.py
Outdated
|
||
ret = { | ||
'name': name, | ||
'result': success, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result
field in the highstate output and the success
field in the job output are separate things and should be handled separately. success
indicates whether there were any internal Salt failures, whereas the result
field indicates whether any individual state functions failed.
@smarsching Are you able to review the latest review feedback from @whiteinge ? |
The code in saltmod.parallel_runners would fail if the (optional) kwarg argument was missing. This is fixed by using an empty dictionary for kwarg by default.
@whiteinge Thank you for your very valuable comments. Sorry it took me so long to reply. I've simply been quite busy with other stuff. The documentation issues that you found are of course simple typos and I fixed them. The problem with the runner reporting an error, even though the sub-runners were successful, is actually due to a bug that I fixed with PR #39641. However, this PR was for the 2016.3 branch and according to my knowledge has not been forward-ported to the develop branch yet. Once this patch is applied, the Regarding the format of the arguments, I fully agree that we should aim for a style that is as close to the style used in other places as reasonyble possible. In fact, the consistency in style was one of the reasons that made me choose SaltStack over one of its competitors. 😉 However, I think that the format that you suggest will not work because of the possible duplicate keys. Consider your example:
This works fine as long as there is only a single runner (which kind of defeats the purpose of the new
If we wanted to, we could use the order of the key-value pairs and start a new dict everytime we see the
The downside of this format is that it is kind of verbose because each of the runner instances needs a different name. On the other hand, this name could serve as an identifier in an improved return value format (in particular regarding changes). One could use the key in place of the I also thought about limiting the use to the orchestrate runner, which would simplify the format to something like this:
Optionally, the states could take additional key-value parameters for passing arguments (like pillar data):
However, for some scenarios it might be interesting to parallize other runners and not only the orchestrate runner. For this reason, I would rather have a generic solution and then maybe build a simplified version for the orchestrate runner on top of it. In summary, I would prefer the slightly more verbose format, but I would be happy to hear your opinion. Regarding the other issues that you mention I am not entirely sure how to proceed. I am quite new to SaltStack and do not fully understand the API for runners yet, in particular when it comes to the format of their return values. I started of by basically copying the logic from I am bit confused by your statement that the Is there any documentation regarding the output format for states and runners? From your comments, I understand that the format also seems to depend on whether the runner is the orchestrate runner (or whether a state is a highstate), but I could not find a specification telling me when to expect which format. If there is no such specification yet, maybe you can help me with a few questions:
This information would really help me with improving the |
Great point. Your dict-based suggestion (repeated below) feels the most "Salty" to me. The key is unnecessary, like you pointed out, but your suggestion to use that to label the output for each is a great idea.
You can only rely on the top-level keys ( That said, I'd suggest avoiding trying to infer too much from the sub-returns and rather just returning the result verbatim as nested data structures. If you go with the config syntax above you could use that dictionary key instead of the minion name like
Runner modules, like execution modules, do not have a defined return structure. They can return whatever is appropriate for the data they're trying to show. If a module is returning the result of a state run (including orchestrate) then the return should be consistent with what the highstate outputter is expecting.
Yes. This output should be able to be run through the highstate outputter successfully. Expanding on my comment above, I'd suggest something like:
That sounds like |
Great, thanks a lot for your answers. I think this gives me the information that I need for improving the function. I will try to implement the changes and get back to you as soon as I have something to be reviewed. However, it might be a few days before I find the time to look into this. |
The name parameter in a call to dict.get(...) was accidentally wrapped in brackets, leading to a TypeError ("unhashable type: 'list'").
I did not get to work on the discussed issues yet because I first had to fix a race condition that was exposed by the PR #39948 provides the fix for this race condition. Maybe you could review this PR (or find someone who might be able to review it). I think it makes sense to merge that PR before this PR because with this race condition present the |
The configuration format for specifying the list of runners has been changed so that it matches the format used in other places. The merging of outputs from the runners has been improved so that the outputs are correctly passed on regardless of the format used by the runner.
@whiteinge I implemented the changes that we discussed. The configuration format now uses a key for identifying each runner and below this key the runner configuration is expected as a list of key-value pairs. The merging of the output now works like this: If the If any of the runners' return values does not match the I also made a change to how failed runners are handled: For each failed runner, the complete output of the runner is included in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. Solid code and it passed a few local smoke-tests.
Thanks for those changes. Very cool! There's a couple related PRs referenced above and it would be nice to get the highstate outputter working for the nested highstate structures like Orchestrate does (but I don't know how to do that offhand). IMO, this is ready for merge and we can fine-tune over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like, due to the nature of multiple jobs being spawned by this, we won't be able to get the jid into the return like in other states. This information is just there as metadata though, nothing relies on it. So, I think we're OK here.
Go Go Jenkins! |
There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches.
There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json module has been modified to takes care of unwrapping objects that are proxied using the ThreadLocalProxy. The salt.utils.msgpack module has been added and basically provides the same functions as the salt.utils.json module, but for msgpack. Like the json module, it takes care of unwrapping proxies.
There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json and salt.utils.msgpack modules have been modified to take care of unwrapping objects that are proxied using the ThreadLocalProxy.
There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json and salt.utils.msgpack modules have been modified to take care of unwrapping objects that are proxied using the ThreadLocalProxy.
There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json and salt.utils.msgpack modules have been modified to take care of unwrapping objects that are proxied using the ThreadLocalProxy.
There was a race condition in the salt loader when injecting global values (e.g. "__pillar__" or "__salt__") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously. There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules. These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on saltstack#39670. This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change. This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach used by the two earlier patches. Unfortunately, the stand JSON and Msgpack serialization code cannot handle proxied objects, so they have to be unwrapped before passing them to that code. The salt.utils.json and salt.utils.msgpack modules have been modified to take care of unwrapping objects that are proxied using the ThreadLocalProxy.
What does this PR do?
This new state function is intended for use with the orchestrate runner. It is used in a way very similar to saltmod.runner, except that it executes multiple runners in parallel.
What issues does this PR fix or reference?
This PR adds a general feature and is not particularly focused on fixing a single issue. However, I believe that at least some of the use cases discussed in #32488 and #32956 are covered by this new feature.
The use case that I wrote it for is orchestrating actions where some can run in parallel and some have to run after each other. In my case, I want to run a system upgrade on all VMs hosted on a number of VM hosts. As running a system upgrade generates a significant I/O load, I only want to upgrade one VM per host at a time. However, upgrades on different VM hosts shall run in parallel.
With the existing features of the orchestrate runner, this was not possible (except for launching several instances of the orchestrate runner from the shell, of course). With the new feature, added by this PR, I can now have an SLS file that triggers the execution of a separate runner for each VM host and waits until all of these runners have finished.
The state blocks, until all runners triggered by it have finished. Therefore, it is possible to run certain actions in parallel and then continuing with another action, which depends on all of these actions having finished. Therefore, I believe that with the addition of this new feature, orchestration scenarios with very complex dependencies between actions can be covered correctly.
Previous Behavior
The
saltmod.parallel_runners
state function did not exist.New Behavior
The
saltmod.parallel_runners
state function has been added.Tests written?
No