Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the saltmod.parallel_runners state. #39670

Merged
merged 9 commits into from
Sep 28, 2017

Conversation

smarsching
Copy link
Contributor

What does this PR do?

This new state function is intended for use with the orchestrate runner. It is used in a way very similar to saltmod.runner, except that it executes multiple runners in parallel.

What issues does this PR fix or reference?

This PR adds a general feature and is not particularly focused on fixing a single issue. However, I believe that at least some of the use cases discussed in #32488 and #32956 are covered by this new feature.

The use case that I wrote it for is orchestrating actions where some can run in parallel and some have to run after each other. In my case, I want to run a system upgrade on all VMs hosted on a number of VM hosts. As running a system upgrade generates a significant I/O load, I only want to upgrade one VM per host at a time. However, upgrades on different VM hosts shall run in parallel.

With the existing features of the orchestrate runner, this was not possible (except for launching several instances of the orchestrate runner from the shell, of course). With the new feature, added by this PR, I can now have an SLS file that triggers the execution of a separate runner for each VM host and waits until all of these runners have finished.

The state blocks, until all runners triggered by it have finished. Therefore, it is possible to run certain actions in parallel and then continuing with another action, which depends on all of these actions having finished. Therefore, I believe that with the addition of this new feature, orchestration scenarios with very complex dependencies between actions can be covered correctly.

Previous Behavior

The saltmod.parallel_runners state function did not exist.

New Behavior

The saltmod.parallel_runners state function has been added.

Tests written?

No

This new state is intended for use with the orchestrate runner. It is
used in a way very similar to saltmod.runner, except that it executes
multiple runners in parallel.
@ghost
Copy link

ghost commented Feb 26, 2017

@smarsching, thanks for your PR! By analyzing the history of the files in this pull request, we identified @whiteinge, @terminalmage and @thatch45 to be potential reviewers.

Copy link
Contributor

@thatch45 thatch45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right kind of crazy, I like it. It looks like you have debate about using threads or processes but landed on threads. Should the documentation be updated to reflect the use of threads?

@smarsching
Copy link
Contributor Author

I actually gave the question of processes vs threads some thought.

Initally, I was using processes, but I hit a wall because I did not manage to make the function that does the actual per-process work serializable. I am pretty sure this was only to my lack of expertise in Python (I am mainly working with C++ and Java). I think due to some clever aliasing going on in Salt, the serialization code did not accept the function as a top-level function, even though it was actually defined at the top-level of the code file.

Anyway, I continued with threads and figured out that threads are probably the better choice anyway:

  • Threads are more lightweight than processes (which could be an issue when running many actions in parallel).
  • The GIL should not be an issue because the threads are going to spent most of the time waiting on I/O anyway.
  • There is only a single process, so pressing Ctrl-C will actually kill the whole thing (this why I am using daemon threads).

Of course I am open to a solution using processes if you see any benefits in it. Adapting the code should not be very difficult (only the _parallel_map function would need some changes). However, I would need some help with figuring out how to write a function that is serializable and thus actually accepted by the multiprocessing API.

The documentation erroneously used the word process in one place where thread
would actually have been correct. This commit fixes this issue.
@smarsching
Copy link
Contributor Author

Sorry, I totally missed your point about the documentation mentioning processes (I only saw the places where the term "thread" was already used). I now added a commit that ensures that the term "thread" is used consistently everywhere in the documentation.

Copy link
Contributor

@whiteinge whiteinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition. This is a generic way to fulfil a few similar asks. Thanks for cross-linking them in the description. One other is probably #33390.

Few notes above.

Also the following Orchestrate run reports errors despite successful sub-orchestrate runs. I'm not sure why at quick glance. You can see the full states I'm using in the debug output but let me know if you have any other questions.

% salt-run -c $VIRTUAL_ENV/etc/salt -l debug state.orch prun
[...snip...]
[DEBUG   ] Rendered data from file: /Users/shouse/tmp/venvs/salt/var/cache/salt/master/files/base/prun.sls:
parallel-state:
   salt.parallel_runners:
     - runners:
       - name: state.orch
         kwarg:
           mods: gndn
       - name: state.orch
         kwarg:
           mods: gndn

[...snip...]
[INFO    ] Executing state salt.parallel_runners for [parallel-state]
[DEBUG   ] Unable to fire args event due to missing __orchestration_jid__
[...snip...]
[DEBUG   ] Rendered data from file: /Users/shouse/tmp/venvs/salt/var/cache/salt/master/files/base/gndn.sls:
gndn:
  test.succeed_with_changes

[...snip...]
[INFO    ] Completed state [parallel-state] at time 11:24:07.739789 duration_in_ms=3465.108
[DEBUG   ] Sending event: tag = salt/run/20170302112401522902/ret; data = {'fun_args': ['prun'], 'jid': '20170302112401522902', 'return': {'outputter': 'highstate', 'data': {'burke.lan_master': {'salt_|-parallel-state_|-parallel-state_|-parallel_runners': {'comment': "Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.\nRunner 1 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.", 'name': 'parallel-state', '__orchestration__': True, 'start_time': '11:24:04.274681', 'result': False, 'duration': 3465.108, '__run_num__': 0, '__sls__': u'prun', 'changes': {'[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}, '[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'parallel-state'}}, 'retcode': 1}}, 'success': False, '_stamp': '2017-03-02T18:24:07.740730', 'user': 'shouse', 'fun': 'runner.state.orch'}
burke.lan_master:
----------
          ID: parallel-state
    Function: salt.parallel_runners
      Result: False
     Comment: Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.
              Runner 1 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.
     Started: 11:24:04.274681
    Duration: 3465.108 ms
     Changes:
              ----------
              [0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]:
                  ----------
                  new:
                      Something pretended to change
                  old:
                      Unchanged
              [1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]:
                  ----------
                  new:
                      Something pretended to change
                  old:
                      Unchanged

Summary for burke.lan_master
------------
Succeeded: 0 (changed=1)
Failed:    1
------------
Total states run:     1
Total run time:   3.465 s
retcode:
    1
[DEBUG   ] LazyLoaded local_cache.prep_jid
[INFO    ] Runner completed: 20170302112401522902
[DEBUG   ] Runner return: {'outputter': 'highstate', 'data': {'burke.lan_master': {'salt_|-parallel-state_|-parallel-state_|-parallel_runners': {'comment': u"Runner 0 was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.729247', 'result': True, 'duration': 1.607, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}}, 'retcode': 0}}.\nRunner 1was not successful and returned {'outputter': 'highstate', 'data': {'burke.lan_master': {'test_|-gndn_|-gndn_|-succeed_with_changes': {'comment': 'Success!', 'name': 'gndn', 'start_time': '11:24:07.722510', 'result': True, 'duration': 0.735, '__run_num__': 0, '__sls__': u'gndn', 'changes': {'testing': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'gndn'}},'retcode': 0}}.", 'name': 'parallel-state', '__orchestration__': True, 'start_time': '11:24:04.274681', 'result': False, 'duration': u'3465.108 ms', '__run_num__': 0, '__sls__': u'prun', 'changes': {'[1][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}, '[0][burke.lan_master][test_|-gndn_|-gndn_|-succeed_with_changes][testing]': {'new': 'Something pretended to change', 'old': 'Unchanged'}}, '__id__': 'parallel-state'}}, 'retcode': 1}}

.. code-block:: yaml

parallel-state:
saltext.parallel-runner:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be salt.parallel_runners

- runners:
- name: state.orchestrate
kwarg:
mods: orchestrate_state_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [{"name": "...", "kwarg": {...}] makes perfect sense as an easy-to-write/easy-to-use data structure...but it's not canonical Salt. I can't think of another place in Salt that has something similar so I think it's worth avoiding solely for that reason. Other suggestions welcome, but the following suggestion matches syntaxes found elsewhere in Salt. You can use repack_dictlist to convert the list of dicts to a dict.

             - runners:
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_1

- name: state.orchestrate
kwarg:
mods: orchestrate_state_1
- name: state.orcestrate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight typo: missing the "h".

Related: the error message for fat-fingered function names is a large traceback. I think the lazy-loader has an easy way for us to check those in advance. @cachedout is that correct?

[ERROR   ] An exception occurred in this state: Traceback (most recent call last):
  File "/Users/shouse/src/salt/salt/salt/state.py", line 1812, in call
    **cdata['kwargs'])
  File "/Users/shouse/src/salt/salt/salt/loader.py", line 1724, in wrapper
    return f(*args, **kwargs)
  File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 784, in parallel_runners
    outputs = _parallel_map(call_runner, runners)
  File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 102, in _parallel_map
    six.reraise(exc_type, exc_value, exc_traceback)
  File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 90, in run_thread
    outputs[index] = func(inputs[index])
  File "/Users/shouse/src/salt/salt/salt/states/saltmod.py", line 782, in call_runner
    **(runner_config['kwarg']))
  File "/Users/shouse/src/salt/salt/salt/modules/saltutil.py", line 1312, in runner
    full_return=full_return)
  File "/Users/shouse/src/salt/salt/salt/runner.py", line 144, in cmd
    full_return)
  File "/Users/shouse/src/salt/salt/salt/client/mixins.py", line 226, in cmd
    self.functions[fun], arglist, pub_data
  File "/Users/shouse/src/salt/salt/salt/loader.py", line 1095, in __getitem__
    func = super(LazyLoader, self).__getitem__(item)
  File "/Users/shouse/src/salt/salt/salt/utils/lazy.py", line 101, in __getitem__
    raise KeyError(key)
KeyError: 'state.orcestrate'

__orchestration_jid__=jid,
__env__=__env__,
full_return=True,
**(runner_config['kwarg']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should account for runners that don't require args. Perhaps runner_config.get('kwarg', {}).

lambda x, y: x and y,
[not ('success' in out and not out['success']) for out in outputs],
True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is intended to kick off any Runners in parallel and not just Orchestrate. A good way to detect if it's an orchestrate run or not is to test if each output contains an out key that has the value highstate. If so, it's an orchestrate or state run and will have the usual state return dictionary. If not, then you can short-circuit looking for those fields and just return the outputs verbatim.


ret = {
'name': name,
'result': success,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result field in the highstate output and the success field in the job output are separate things and should be handled separately. success indicates whether there were any internal Salt failures, whereas the result field indicates whether any individual state functions failed.

@cachedout
Copy link
Contributor

@smarsching Are you able to review the latest review feedback from @whiteinge ?

The code in saltmod.parallel_runners would fail if the (optional) kwarg
argument was missing. This is fixed by using an empty dictionary for
kwarg by default.
@smarsching
Copy link
Contributor Author

@whiteinge Thank you for your very valuable comments. Sorry it took me so long to reply. I've simply been quite busy with other stuff.

The documentation issues that you found are of course simple typos and I fixed them. The problem with the runner reporting an error, even though the sub-runners were successful, is actually due to a bug that I fixed with PR #39641. However, this PR was for the 2016.3 branch and according to my knowledge has not been forward-ported to the develop branch yet. Once this patch is applied, the parallel_runners state function should not report failure any longer.

Regarding the format of the arguments, I fully agree that we should aim for a style that is as close to the style used in other places as reasonyble possible. In fact, the consistency in style was one of the reasons that made me choose SaltStack over one of its competitors. 😉

However, I think that the format that you suggest will not work because of the possible duplicate keys. Consider your example:

             - runners:
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_1

This works fine as long as there is only a single runner (which kind of defeats the purpose of the new parallel_runners function). When we extend it to two runners running in parallel, it does not work any longer:

             - runners:
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_1
               - name: state.orchestrate
               - kwarg:
                    mods: orchestrate_state_2

If we wanted to, we could use the order of the key-value pairs and start a new dict everytime we see the name key, but this looks very messy to me. A cleaner format could look like this:

             - runners:
                 orch1:
                   - name: state.orchestrate
                   - kwarg:
                      mods: orchestrate_state_1
                 orch2:
                 - name: state.orchestrate
                 - kwarg:
                      mods: orchestrate_state_2

The downside of this format is that it is kind of verbose because each of the runner instances needs a different name. On the other hand, this name could serve as an identifier in an improved return value format (in particular regarding changes). One could use the key in place of the name parameter when it is not specified explicitly. This would then be consistent with other places in Salt, but would hardly be useful here because the most common case is calling the same runner (state.orchestrate) with different parameters.

I also thought about limiting the use to the orchestrate runner, which would simplify the format to something like this:

             - mods:
                 - orchestrate_state_1
                 - orchestrate_state_2

Optionally, the states could take additional key-value parameters for passing arguments (like pillar data):

             - mods:
                 - orchestrate_state_1
                     - pillar:
                         mykey: myvalue
                 - orchestrate_state_2

However, for some scenarios it might be interesting to parallize other runners and not only the orchestrate runner. For this reason, I would rather have a generic solution and then maybe build a simplified version for the orchestrate runner on top of it.

In summary, I would prefer the slightly more verbose format, but I would be happy to hear your opinion.

Regarding the other issues that you mention I am not entirely sure how to proceed. I am quite new to SaltStack and do not fully understand the API for runners yet, in particular when it comes to the format of their return values.

I started of by basically copying the logic from saltmod.runner. The output of saltmod.runner seems to differ from the output produced by most states, so I made some adjustments (e.g. regarding the changes key). However, after having used the parallel_runners state function for a few days, I am not happy with the way it handles output: Looking at the changes dictionary is confusing at best and sometimes changes are even missing because the underlying modules did not return the changes in the expected old / new format.

I am bit confused by your statement that the success key is used for internal errors while the result is used for the state’s or module’s result. From looking at the code that usually sets the success field, I got the same impression. However, the saltmod.runner function uses the success field of the runner’s output to set its result field. So there seem to be some inconsistencies there...

Is there any documentation regarding the output format for states and runners? From your comments, I understand that the format also seems to depend on whether the runner is the orchestrate runner (or whether a state is a highstate), but I could not find a specification telling me when to expect which format. If there is no such specification yet, maybe you can help me with a few questions:

  • What is the expected structure for the return value of a state function?
  • What is the expected structure for the return value of a runner?
  • Should the saltmod.* state functions adhere to the same structure as the regular state functions or are they different because they are only expected to be used in the SLS files used for the orchestrate runner?
  • Can I expect a specific sub-structure for the return value of the orchestrate runner (in particular regarding changes)? If so, is this limited to the orchestrate runner, or is there a general guideline for when to expect each structure (e.g. highstate == False or missing means format A and highstate == True means format B)?

This information would really help me with improving the saltmod.parallel_runners function and making its output more readable.

@whiteinge
Copy link
Contributor

whiteinge commented Mar 6, 2017

the format that you suggest will not work because of the possible duplicate keys.

Great point. Your dict-based suggestion (repeated below) feels the most "Salty" to me. The key is unnecessary, like you pointed out, but your suggestion to use that to label the output for each is a great idea.

             - runners:
                 orch1:
                   - name: state.orchestrate
                   - kwarg:
                      mods: orchestrate_state_1

What is the expected structure for the return value of a state function?

You can only rely on the top-level keys (changes, result, comment, changes). The contents of changes are free-form and will not necessarily have old/new keys.

That said, I'd suggest avoiding trying to infer too much from the sub-returns and rather just returning the result verbatim as nested data structures. If you go with the config syntax above you could use that dictionary key instead of the minion name like salt.state does for each return, e.g.: {...snip parallel_runners output..., "changes": {"ret": {"orch1": {...verbatim output from orch1...}}}}. (More on this below.)

What is the expected structure for the return value of a runner?

Runner modules, like execution modules, do not have a defined return structure. They can return whatever is appropriate for the data they're trying to show. If a module is returning the result of a state run (including orchestrate) then the return should be consistent with what the highstate outputter is expecting.

Should the saltmod.* state functions adhere to the same structure as the regular state functions or are they different because they are only expected to be used in the SLS files used for the orchestrate runner?

Yes. This output should be able to be run through the highstate outputter successfully. Expanding on my comment above, I'd suggest something like:

{
    "...parallel_runners return...": {
        "changes": {"ret": {"...name from dict...": {...verbatim return...}, ...}}, 
        # Exception: if all returns are state returns (`out` == `highstate`), then
        # `changes` should be an empty dict `{}` if all the sub-state runs
        # also have empty `changes` dicts.

        "result": ...True if `exit_code` is non-zero for all returns
    }
    "success": ...True if no Salt exceptions...,
    "exit_code": ...0 if all `result`fields are True otherwise 1...
}

the saltmod.runner function uses the success field of the runner’s output to set its result field

That sounds like salt.state is doing it wrong but I could be mistaken. I would expect result to be an all(...other result fields...) and success to be an all(...other success fields...).

@smarsching
Copy link
Contributor Author

Great, thanks a lot for your answers. I think this gives me the information that I need for improving the function. I will try to implement the changes and get back to you as soon as I have something to be reviewed. However, it might be a few days before I find the time to look into this.

The name parameter in a call to dict.get(...) was accidentally wrapped
in brackets, leading to a TypeError ("unhashable type: 'list'").
@smarsching
Copy link
Contributor Author

I did not get to work on the discussed issues yet because I first had to fix a race condition that was exposed by the salt.parallel_runners function and caused a lot of trouble for me. This happened when passing individual pillar data to parallel tasks (through the pillar argument of the state.orchestrate function).

PR #39948 provides the fix for this race condition. Maybe you could review this PR (or find someone who might be able to review it). I think it makes sense to merge that PR before this PR because with this race condition present the parallel_runners function will have undefined behavior in certain scenarios.

The configuration format for specifying the list of runners has been
changed so that it matches the format used in other places.
The merging of outputs from the runners has been improved so that the
outputs are correctly passed on regardless of the format used by the
runner.
@smarsching
Copy link
Contributor Author

@whiteinge I implemented the changes that we discussed. The configuration format now uses a key for identifying each runner and below this key the runner configuration is expected as a list of key-value pairs.

The merging of the output now works like this:

If the outputter key is present in the return value and is equal to highstate for all runners, the code recursively searches the data tree for changes keys. When such a key is found, and has an out sub-key with a value of highstate, the value associated with the ret sub-key is used. If there is no out sub-key or the value is not highstate, the value associated with the changes key is copied as-is.

If any of the runners' return values does not match the highstate format, the returned changes have a single ret key, below which there is a key-value pair for each runner, where the key is the ID used in the configuration and the value is the return value of the runner.

I also made a change to how failed runners are handled: For each failed runner, the complete output of the runner is included in the comment and formatted using the nested outputter. This makes it much easier to identify the cause of a problem.

Copy link
Contributor

@whiteinge whiteinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. Solid code and it passed a few local smoke-tests.

@whiteinge
Copy link
Contributor

Thanks for those changes. Very cool!

There's a couple related PRs referenced above and it would be nice to get the highstate outputter working for the nested highstate structures like Orchestrate does (but I don't know how to do that offhand).

IMO, this is ready for merge and we can fine-tune over time.

Copy link
Contributor

@terminalmage terminalmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like, due to the nature of multiple jobs being spawned by this, we won't be able to get the jid into the return like in other states. This information is just there as metadata though, nothing relies on it. So, I think we're OK here.

@cachedout
Copy link
Contributor

Go Go Jenkins!

smarsching added a commit to smarsching/salt that referenced this pull request Jan 17, 2018
There was a race condition in the salt loader when injecting global
values (e.g. "__pillar__" or "__salt__") into modules. One effect of
this race condition was that in a setup with multiple threads, some
threads may see pillar data intended for other threads or the pillar
data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397).
These patches tried to fix the problem by storing the dictionary that
keeps the relevant data in a thread-local variable and referencing this
thread-local variable from the variables that are injected into the
modules.

These patches did not fix the problem completely because they only
work when a module is loaded through a single loader instance only.
When there is more than one loader, there is more than one
thread-local variable and the variable injected into a module is
changed to point to another thread-local variable when the module is
loaded again. Thus, the problem resurfaced while working on saltstack#39670.

This patch attempts to solve the problem from a slightly different
angle, complementing the earlier patches: The value injected into the
modules now is a proxy that internally uses a thread-local variable to
decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar
data), the data is actually only changed in the thread in which the
loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where
loaders are possibly created by many different modules and these
modules do not necessary know in which context they are executed. Thus
it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.
smarsching added a commit to smarsching/salt that referenced this pull request Mar 10, 2018
There was a race condition in the salt loader when injecting global
values (e.g. "__pillar__" or "__salt__") into modules. One effect of
this race condition was that in a setup with multiple threads, some
threads may see pillar data intended for other threads or the pillar
data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397).
These patches tried to fix the problem by storing the dictionary that
keeps the relevant data in a thread-local variable and referencing this
thread-local variable from the variables that are injected into the
modules.

These patches did not fix the problem completely because they only
work when a module is loaded through a single loader instance only.
When there is more than one loader, there is more than one
thread-local variable and the variable injected into a module is
changed to point to another thread-local variable when the module is
loaded again. Thus, the problem resurfaced while working on saltstack#39670.

This patch attempts to solve the problem from a slightly different
angle, complementing the earlier patches: The value injected into the
modules now is a proxy that internally uses a thread-local variable to
decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar
data), the data is actually only changed in the thread in which the
loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where
loaders are possibly created by many different modules and these
modules do not necessary know in which context they are executed. Thus
it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.

Unfortunately, the stand JSON and Msgpack serialization code cannot
handle proxied objects, so they have to be unwrapped before passing them
to that code.

The salt.utils.json module has been modified to  takes care of
unwrapping objects that are proxied using the ThreadLocalProxy.

The salt.utils.msgpack module has been added and basically provides the
same functions as the salt.utils.json module, but for msgpack. Like the
json module, it takes care of unwrapping proxies.
smarsching added a commit to smarsching/salt that referenced this pull request Nov 27, 2018
There was a race condition in the salt loader when injecting global
values (e.g. "__pillar__" or "__salt__") into modules. One effect of
this race condition was that in a setup with multiple threads, some
threads may see pillar data intended for other threads or the pillar
data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397).
These patches tried to fix the problem by storing the dictionary that
keeps the relevant data in a thread-local variable and referencing this
thread-local variable from the variables that are injected into the
modules.

These patches did not fix the problem completely because they only
work when a module is loaded through a single loader instance only.
When there is more than one loader, there is more than one
thread-local variable and the variable injected into a module is
changed to point to another thread-local variable when the module is
loaded again. Thus, the problem resurfaced while working on saltstack#39670.

This patch attempts to solve the problem from a slightly different
angle, complementing the earlier patches: The value injected into the
modules now is a proxy that internally uses a thread-local variable to
decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar
data), the data is actually only changed in the thread in which the
loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where
loaders are possibly created by many different modules and these
modules do not necessary know in which context they are executed. Thus
it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.

Unfortunately, the stand JSON and Msgpack serialization code cannot
handle proxied objects, so they have to be unwrapped before passing them
to that code.

The salt.utils.json and salt.utils.msgpack modules have been modified to
take care of unwrapping objects that are proxied using the
ThreadLocalProxy.
smarsching added a commit to smarsching/salt that referenced this pull request Nov 27, 2018
There was a race condition in the salt loader when injecting global
values (e.g. "__pillar__" or "__salt__") into modules. One effect of
this race condition was that in a setup with multiple threads, some
threads may see pillar data intended for other threads or the pillar
data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397).
These patches tried to fix the problem by storing the dictionary that
keeps the relevant data in a thread-local variable and referencing this
thread-local variable from the variables that are injected into the
modules.

These patches did not fix the problem completely because they only
work when a module is loaded through a single loader instance only.
When there is more than one loader, there is more than one
thread-local variable and the variable injected into a module is
changed to point to another thread-local variable when the module is
loaded again. Thus, the problem resurfaced while working on saltstack#39670.

This patch attempts to solve the problem from a slightly different
angle, complementing the earlier patches: The value injected into the
modules now is a proxy that internally uses a thread-local variable to
decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar
data), the data is actually only changed in the thread in which the
loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where
loaders are possibly created by many different modules and these
modules do not necessary know in which context they are executed. Thus
it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.

Unfortunately, the stand JSON and Msgpack serialization code cannot
handle proxied objects, so they have to be unwrapped before passing them
to that code.

The salt.utils.json and salt.utils.msgpack modules have been modified to
take care of unwrapping objects that are proxied using the
ThreadLocalProxy.
sbrennan4 pushed a commit to sbrennan4/salt that referenced this pull request Sep 17, 2019
There was a race condition in the salt loader when injecting global
values (e.g. "__pillar__" or "__salt__") into modules. One effect of
this race condition was that in a setup with multiple threads, some
threads may see pillar data intended for other threads or the pillar
data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397).
These patches tried to fix the problem by storing the dictionary that
keeps the relevant data in a thread-local variable and referencing this
thread-local variable from the variables that are injected into the
modules.

These patches did not fix the problem completely because they only
work when a module is loaded through a single loader instance only.
When there is more than one loader, there is more than one
thread-local variable and the variable injected into a module is
changed to point to another thread-local variable when the module is
loaded again. Thus, the problem resurfaced while working on saltstack#39670.

This patch attempts to solve the problem from a slightly different
angle, complementing the earlier patches: The value injected into the
modules now is a proxy that internally uses a thread-local variable to
decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar
data), the data is actually only changed in the thread in which the
loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where
loaders are possibly created by many different modules and these
modules do not necessary know in which context they are executed. Thus
it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.

Unfortunately, the stand JSON and Msgpack serialization code cannot
handle proxied objects, so they have to be unwrapped before passing them
to that code.

The salt.utils.json and salt.utils.msgpack modules have been modified to
take care of unwrapping objects that are proxied using the
ThreadLocalProxy.
@Akm0d Akm0d mentioned this pull request Nov 8, 2019
Akm0d pushed a commit to Akm0d/salt that referenced this pull request Jan 14, 2020
There was a race condition in the salt loader when injecting global
values (e.g. "__pillar__" or "__salt__") into modules. One effect of
this race condition was that in a setup with multiple threads, some
threads may see pillar data intended for other threads or the pillar
data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (saltstack#27937, saltstack#29397).
These patches tried to fix the problem by storing the dictionary that
keeps the relevant data in a thread-local variable and referencing this
thread-local variable from the variables that are injected into the
modules.

These patches did not fix the problem completely because they only
work when a module is loaded through a single loader instance only.
When there is more than one loader, there is more than one
thread-local variable and the variable injected into a module is
changed to point to another thread-local variable when the module is
loaded again. Thus, the problem resurfaced while working on saltstack#39670.

This patch attempts to solve the problem from a slightly different
angle, complementing the earlier patches: The value injected into the
modules now is a proxy that internally uses a thread-local variable to
decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar
data), the data is actually only changed in the thread in which the
loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where
loaders are possibly created by many different modules and these
modules do not necessary know in which context they are executed. Thus
it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.

Unfortunately, the stand JSON and Msgpack serialization code cannot
handle proxied objects, so they have to be unwrapped before passing them
to that code.

The salt.utils.json and salt.utils.msgpack modules have been modified to
take care of unwrapping objects that are proxied using the
ThreadLocalProxy.
@smarsching smarsching deleted the feature-parallel-runners branch December 9, 2022 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ZD The issue is related to a Zendesk customer support ticket.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants