Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal for runners #536

Merged
merged 1 commit into from
May 10, 2020
Merged

proposal for runners #536

merged 1 commit into from
May 10, 2020

Conversation

dbarrosop
Copy link
Contributor

@dbarrosop dbarrosop commented May 6, 2020

This PR introduces the concept of runners. Prior to this PR the only way to control the execution was via num_workers which basically toggled between using threads or not and configured how many threads to use if workers was greater than 1. With this proposal we can write plugins that act as a "runner", a piece of code that distributes the task across the hosts.

Writing a runner is fairly straightforward (or complex, depending of what you want to do). As an example, the runner that implements parallelization as before is this:

class ParallelRunner:
    """
    ParallelRunner runs the task over each host using threads

    Arguments:
        num_workers: number of threads to use
    """

    def __init__(self, num_workers: int = 20) -> None:
        self.num_workers = num_workers

    def run(self, task: Task, hosts: List[Host]) -> AggregatedResult:
        result = AggregatedResult(task.name)
        futures = []
        with ThreadPoolExecutor(self.num_workers) as pool:
            for host in hosts:
                future = pool.submit(task.copy().start, host)
                futures.append(future)

        for future in futures:
            worker_result = future.result()
            result[worker_result.host.name] = worker_result
        return result

The API to use them is slightly different, num_workers is no longer available to nr.run, instead you can configure the runner with the configuration and/or override it with nr.with_runner:

# use what's set previously or by the configuration, same default as before
nr.run(...) 

# override for this run
nr.with_runner(ParallelRunner(num_workers=100)).run(...)

# set a new runner for a new nornir object
nr_new_runner = nr.with_runner(ParallelRunner(num_workers=100)) 

The equivalent of nr.run(num_workers=1) now would be:

nr.with_runner(SerialRunner()).run(...)

Another example

If you wanted to write a runner that run the task over the hosts in alphabetical order and without threads, you could do something like:

def host_key(host):
    return host.name

class OCDRunner:
    def __init__(self, reverse: bool=False) -> None:
        self.reverse = reverse

    def run(self, task: Task, hosts: List[Host]) -> AggregatedResult:
        result = AggregatedResult(task.name)
        for hostname, host in sorted(hosts, key=host_key, reverse=self.reverse):
            result[host.name] = task.copy().start(host)
        return result

and then set it in the configuration like this:

runner:
    # plugin needs to be registered though, more on this in a more elaborate howto
    plugin: OCDRunner
    options:
         reverse: true

or:

nr.with_runner(OCDRunner(reverse=True)).run(...)

Todo

  1. Add to the tutorial
  2. Write a more detailed how to

@dbarrosop
Copy link
Contributor Author

Fixes #522

@dbarrosop
Copy link
Contributor Author

merging this one as I want to fully revamp the docs in a dedicated PR. If you have feedback, feel free to drop a comment and I will address it later.

@dbarrosop dbarrosop merged commit 5c293af into 3.0.0 May 10, 2020
@dbarrosop dbarrosop added this to the 3.0.0 milestone May 10, 2020
@ktbyers
Copy link
Collaborator

ktbyers commented May 12, 2020

@dbarrosop Am I understanding this correctly that num_workers is no longer available as a configuration option nor can you specify it as an argument to nr.run()?

@dbarrosop
Copy link
Contributor Author

dbarrosop commented May 13, 2020

num_workers is no longer available as a configuration option

It is but in a dedicated section for the runner:

runner:
    plugin: ParallelRunner
    options:
         num_workers: 123

Basically same structure as for configuring the inventory.

nor can you specify it as an argument to nr.run()?

You can't but the runner can now be configured with with_runner like with the processors so you could do this:

nr.with_runner(ParallelRunner(213)).run(...)
nr.with_runner(ParallelRunner(33)).run(...)
nr.with_runner(SerialRunner()).run(...)

Or assign it to a variable and keep reusing:

nr_with_serial = nr.with_runner(SerialRunner()).run(...)
nr_with_333_workers = nr.with_runner(ParallelRunner(333)).run(...)

The rationale for this change is that I wanted this part to be pluggable as well as people could implement their own runners with their own business logic. I.e., "run this in the entire datacenter but only one leaf per rack at a time and abort if two switches in the same rack fail but continue if only one fails". You could implement that in your script via filters but being able to do it in this stage with a plugin gives you more control and makes your script more generic as you can keep your script more straighforward and tweak the behavior via runners (i.e., if you know your config replace operation is only doing small changes you could use the ParallelRunner but if you know you are doing more delicate changes use this other running that does an incremental rollout)

@ktbyers
Copy link
Collaborator

ktbyers commented May 13, 2020

On runners are you open to making a backwards compatible solution so that all of the old num_workers behavior continues to work (num_workers both in config.yaml and as direct argument)?

Basically, this form:

    nr = InitNornir(core={"num_workers": 1})
    nr.run(task=my_task)

And this form in config.yaml:

---
core:
  num_workers: 10

Then dynamically select SerialRunner or ParallelRunner based on this (and set number of threads to be proper-threads).

And consequently make all of the nr.run invocations continue to work properly from a backwards compatibility perspective.

Rational:
The with_runner change is going to cause me huge pain (and big pain for current Nornir users also) as it will break so much existing code. The value of backwards compatibility to me is worth way more than the new format.

It also is going to cause lots of confusion on documentation and when assisting people side.

@dbarrosop
Copy link
Contributor Author

I could be persuaded to put it back raising a deprecationwarning for a certain time (i.e., 6 months) but then I’d want to remove it even without releasing a new major version. Would that work?

@ktbyers
Copy link
Collaborator

ktbyers commented May 13, 2020

Let me look a bit more at embedding things in the config file:

runner:
    plugin: ParallelRunner
    options:
         num_workers: 123

As I didn't experiment with that much.

@dbarrosop
Copy link
Contributor Author

Ok, thanks. That part of the code was very messy with many things conflated so this proposal served for both cleaning those bits (separating concerns in a better way) and also making it pluggable, hence my hesitance for backwards compatibility. However, as mentioned earlier, I guess I could live with some backwards compatibility for a period of time so people could transition.

@ktbyers
Copy link
Collaborator

ktbyers commented May 14, 2020

@dbarrosop Yes, let me keep testing this more with the above section in config.yaml my backwards compatibility concerns are greatly reduced. I.E. you can still execute nr.run() (you just need to specify the relevant parameters in config.yaml).

You can also pretty easily toggle num_workers via adjusting the options.

@wvandeun
Copy link
Contributor

Did I see it correctly that the default behavior (ie no "runner" config and no "with_runner") is still the same as the old behavior (parallel with threads and 20 worker threads)?

@dbarrosop
Copy link
Contributor Author

dbarrosop commented May 15, 2020

no, the default runner is the parallel one (threaded now that it has been renamed):

https://github.com/nornir-automation/nornir/pull/536/files?file-filters%5B%5D=.cfg&file-filters%5B%5D=.py&file-filters%5B%5D=.toml&file-filters%5B%5D=.yaml#diff-4ec7d050f8771494a7c5e0cc75652620R207

So, yes, default behavior is still the same

@ktbyers
Copy link
Collaborator

ktbyers commented May 15, 2020

Okay, I misread David's response the first time...i.e. his initial no threw me.

But then he clarified at the end i.e. the default behavior of nr.run() is the same (20-concurrent connections with the runner now named as threaded).

@dbarrosop
Copy link
Contributor Author

apologies for being unclear :(

@dbarrosop dbarrosop deleted the runners branch June 18, 2020 12:16
dbarrosop added a commit that referenced this pull request Sep 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants