Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Add support for parallelization #171

Open
lencioni opened this issue Dec 19, 2016 · 1 comment
Open

Add support for parallelization #171

lencioni opened this issue Dec 19, 2016 · 1 comment

Comments

@lencioni
Copy link
Contributor

In large projects the happo run may take too long. Although we have some performance improvements in the pipeline (#62), there is a limit to how fast we can make things. We need to provide a way for examples to be split up among multiple machines and have the results aggregated at the end.

There is a good amount of overlap here with #73, so this might make sense to do at the same time.

To enable this, I think we need to do 2 things:

  1. Add options to happo run to only run on a subset of examples and return/output metadata about the run.
  2. Add a mechanism to happo to aggregate partial results into a single result.

In an interest to keep the API simple, it seems like the arguments we want are the number of split points (i.e. the number of machines to parallelize across) and the split point to run on. For instance, if you have 4 machines, you would end up calling happo run 4 times, with arguments like happo run 1/4, happo run 2/4, happo run 3/4, and happo run 4/4. Of course, the arguments could use more explicit flags as well, something like: happo run --split=1 --of=2 (naming needs to be improved). This will work if the order of examples will always be deterministic.

cc @lelandrichardson

@trotzig
Copy link
Contributor

trotzig commented Dec 20, 2016

I added some ideas to #73 that apply here as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants