feat(rosetta): improve translation throughput #3083

rix0rrr · 2021-10-19T16:04:46Z

Previously, Rosetta would divide all the examples to translate into N equally
sized arrays, and spawn N workers to translate them all.

Experimentation shows that the time required to translate samples is very
unequally divided, and many workers used to be idle for half of the time after
having finished their 1/Nth of the samples, hurting throughput.

Switch to a model where we have N workers, and we constantly feed them a
small amount of work until all the work is done. This keeps all workers
busy until the work is complete, improving the throughput a lot.

On my machine, improves a run of Rosetta on the CDK repository
with 8 workers from ~30m to ~15m.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Previously, Rosetta would divide all the examples to translate into `N` equally sized arrays, and spawn `N` workers to translate them all. Experimentation shows that the time required to translate samples is very unequally divided, and many workers used to be idle for half of the time, hurting throughput. Switch to a model where we have `N` workers, and we constantly feed them a small amount of work until all the work is done. This keeps all workers busy until the work is complete, improving the throughput a lot. On my machine, improves a run of Rosetta on the CDK repository with 8 workers from ~30m to ~15m.

njlynch

Copying @RomainMuller 's comment from Slack:

Any reason not to use an existing Worker Pool solution, like https://www.npmjs.com/package/workerpool?

rix0rrr · 2021-10-22T11:35:12Z

Apart from keeping the dependency count low and the control that comes from having the entire implementation available (easy to change parameters, memory settings, logging, etc), not really.

Want me to change it?

njlynch · 2021-10-22T13:43:38Z

Want me to change it?

I'll defer to @RomainMuller , as he originally raised the point. On the positive side, workerpool carries zero dependencies with it, which is lovely. Then again, you've already rolled your own here....

nija-at · 2021-10-25T14:24:56Z

It would be better to use an existing solution rather than rolling our own, unless you think there is a significant advantage to having our own implementation. While the initial implementation is already done, there is the question of continued maintenance which would build up if we go down this path for every new feature we want.

The npm module referred to here has 3M weekly downloads, which should be sufficient indication about its reliability and updates.

rix0rrr · 2021-10-26T09:46:51Z

Switched to using workerpool module

nija-at · 2021-10-26T11:44:25Z

packages/jsii-rosetta/README.md

@@ -192,3 +192,10 @@ Since TypeScript compilation takes a lot of time, much time can be gained by usi
 If worker thread support is available, `jsii-rosetta` will use a number of workers equal to half the number of CPU cores,
 up to a maximum of 16 workers. This default maximum can be overridden by setting the `JSII_ROSETTA_MAX_WORKER_COUNT`
 environment variable.
+


Lines above mention "if support is available". Update those to match the current impl.

mergify · 2021-10-27T08:46:10Z

Thank you for contributing! ❤️ I will now look into making sure the PR is up-to-date, then proceed to try and merge it!

mergify · 2021-10-27T08:46:19Z

Merging (with squash)...

rix0rrr requested a review from a team October 19, 2021 16:04

rix0rrr self-assigned this Oct 19, 2021

mergify bot added the contribution/core This is a PR that came from AWS. label Oct 19, 2021

rix0rrr added 2 commits October 19, 2021 16:16

Remove dummy N setting

0a5bf93

Merge branch 'main' into huijbers/worker-pool

5858f98

rix0rrr changed the title ~~feat(rosetta): switch to continuously fed worker pool~~ feat(rosetta): switch to queue+workers for parallelization Oct 20, 2021

Merge branch 'main' into huijbers/worker-pool

fb29084

rix0rrr changed the title ~~feat(rosetta): switch to queue+workers for parallelization~~ feat(rosetta): improve translation throughput Oct 22, 2021

njlynch reviewed Oct 22, 2021

View reviewed changes

rix0rrr requested a review from RomainMuller October 25, 2021 09:42

Switch to workerpool dependency

41e17d3

rix0rrr requested a review from njlynch October 26, 2021 09:45

rix0rrr added 2 commits October 26, 2021 09:45

Remove handwritten workerpool

20edef4

Remove this file as well

085dbe6

rix0rrr requested a review from nija-at October 26, 2021 09:46

rix0rrr added 2 commits October 26, 2021 10:54

Fix build

8ee6a64

Explain sysctl command

3fd26a6

nija-at approved these changes Oct 26, 2021

View reviewed changes

nija-at added the pr/do-not-merge This PR should not be merged at this time. label Oct 26, 2021

Update README.md

0b2eedb

rix0rrr removed the pr/do-not-merge This PR should not be merged at this time. label Oct 26, 2021

rix0rrr added 3 commits October 26, 2021 15:35

Merge branch 'main' into huijbers/worker-pool

7282fdb

Merge branch 'main' into huijbers/worker-pool

fed7ff6

Merge branch 'main' into huijbers/worker-pool

55453d5

rix0rrr removed the request for review from RomainMuller October 27, 2021 07:49

rix0rrr removed the request for review from njlynch October 27, 2021 07:49

rix0rrr added blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. and removed blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. labels Oct 27, 2021

mergify bot added the pr/ready-to-merge This PR is ready to be merged. label Oct 27, 2021

mergify bot merged commit 919d895 into main Oct 27, 2021

mergify bot deleted the huijbers/worker-pool branch October 27, 2021 08:46

mergify bot removed the pr/ready-to-merge This PR is ready to be merged. label Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rosetta): improve translation throughput #3083

feat(rosetta): improve translation throughput #3083

rix0rrr commented Oct 19, 2021 •

edited

Loading

njlynch left a comment

rix0rrr commented Oct 22, 2021

njlynch commented Oct 22, 2021

nija-at commented Oct 25, 2021

rix0rrr commented Oct 26, 2021

nija-at Oct 26, 2021

mergify bot commented Oct 27, 2021

mergify bot commented Oct 27, 2021

feat(rosetta): improve translation throughput #3083

feat(rosetta): improve translation throughput #3083

Conversation

rix0rrr commented Oct 19, 2021 • edited Loading

njlynch left a comment

Choose a reason for hiding this comment

rix0rrr commented Oct 22, 2021

njlynch commented Oct 22, 2021

nija-at commented Oct 25, 2021

rix0rrr commented Oct 26, 2021

nija-at Oct 26, 2021

Choose a reason for hiding this comment

mergify bot commented Oct 27, 2021

mergify bot commented Oct 27, 2021

rix0rrr commented Oct 19, 2021 •

edited

Loading