You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.
In large projects the happo run may take too long. Although we have some performance improvements in the pipeline (#62), there is a limit to how fast we can make things. We need to provide a way for examples to be split up among multiple machines and have the results aggregated at the end.
There is a good amount of overlap here with #73, so this might make sense to do at the same time.
To enable this, I think we need to do 2 things:
Add options to happo run to only run on a subset of examples and return/output metadata about the run.
Add a mechanism to happo to aggregate partial results into a single result.
In an interest to keep the API simple, it seems like the arguments we want are the number of split points (i.e. the number of machines to parallelize across) and the split point to run on. For instance, if you have 4 machines, you would end up calling happo run 4 times, with arguments like happo run 1/4, happo run 2/4, happo run 3/4, and happo run 4/4. Of course, the arguments could use more explicit flags as well, something like: happo run --split=1 --of=2 (naming needs to be improved). This will work if the order of examples will always be deterministic.
In large projects the happo run may take too long. Although we have some performance improvements in the pipeline (#62), there is a limit to how fast we can make things. We need to provide a way for examples to be split up among multiple machines and have the results aggregated at the end.
There is a good amount of overlap here with #73, so this might make sense to do at the same time.
To enable this, I think we need to do 2 things:
happo run
to only run on a subset of examples and return/output metadata about the run.In an interest to keep the API simple, it seems like the arguments we want are the number of split points (i.e. the number of machines to parallelize across) and the split point to run on. For instance, if you have 4 machines, you would end up calling
happo run
4 times, with arguments likehappo run 1/4
,happo run 2/4
,happo run 3/4
, andhappo run 4/4
. Of course, the arguments could use more explicit flags as well, something like:happo run --split=1 --of=2
(naming needs to be improved). This will work if the order of examples will always be deterministic.cc @lelandrichardson
The text was updated successfully, but these errors were encountered: