introduce custom schedulers for parallel iterators #353

nikomatsakis · 2017-05-31T10:06:26Z

This PR introduces the notion of "custom schedulers". The idea is that you can invoke, for any parallel iterator iter, iter.in_scheduler(S), and it will run with the scheduler S. The scheduler basically takes the place of the (currently hard-coded) "bridge" functions -- it connects a parallel producer and a parallel consumer.

We introduce a Scheduler associated type into the ParallelIterator trait. This is used to ensure that the user cannot invoke in_scheduler() twice on any given parallel iterator. For implicity, we also forbid customized schedulers from the "inputs" where mutiple parallel iterators are "joined" (like zip() or chain()). Hence a.in_scheduler().zip(b) is illegal, but a.zip(b).in_scheduler() is fine. We could alter these rules to permit both (but we want to avoid a.in_scheduler().zip(b.in_scheduler())).

In this PR, Rayon exposes three schedulers:

DefaultScheduler -- what you get if you don't say anything; also encodes the fact that in_scheduler has never been called. Currently, this delegates to AdaptiveScheduler.
AdaptiveScheduler -- the default strategy of "split to depth N, and more if stolen"
FixedLenScheduler -- like above, but where we adjust N to try and respect the user's min/max requests

Originally, the idea here was to allow you to swap out the Rayon runtime completely. In fact, you can still see this in the Runtime trait introduced by the first commit, though I've removed it by the end -- basically because I don't feel we're ready to commit to this trait yet (and in principle no flexibility is lost). I imagine we'll probably add this back at some point in some form.

Some unknowns:

Module structure. I feel like we should move most of these things to iter::scheduler.
What should be done with the Scheduler associated type in ParallelIterator?
- It's kind of weird right now. It sounds like it specifies the scheduler, but really it's always one of two values: DefaultScheduler or CustomScheduler. This is just used to prevent multiple in_scheduler() calls. I think I'd rather just give a semantics to multiple calls in some useful way. This interacts with the next question.
Can we support "layering" schedulers?
- Right now, we prohibit multiple calls to in_scheduler, but I was thinking that it'd also be possible to permit "combining" schedulers (e.g., with_min and with_max). I have to play around with this a bit more.
What to do with chain() and other ad-hoc calls to join()?
- @cuviper recently modified the drive() impl of chain() so that it will drive in parallel, using join(). This kind of goes against the grain of this PR, so we probably want to either extend the scheduler interface with a join method or reframe that as a parallel producer consumer, or something like that.

cc #93 -- this permits a user-provided "backend", though I think ultimately more flexibility might be desired. (This only works for parallel iterators, for one thing.)

nikomatsakis · 2017-05-31T10:08:01Z

OK, I see I was not running all the tests and some broke. Bah. Well, I've got to run now, but I'll get them working.

The scheduler gets threaded down through the calls to `drive()` and `with_producer()`. It replaces the existing "bridge" functions. This version includes two schedulers -- the adaptive scheduler and the fixed-length scheduler. (Note that the existing `min_len()` and `max_len()` combinators are ignored by this commit; they will be replaced by uses of `in_scheduler()` in the next commit.)

This is a breaking change because `iter.with_min(N).with_max(X)` must now be rewritten into `iter.with_min_max(N, X)`. This is because we do not permit more than one `in_scheduler()` call on any particular parallel iterator chain. Also, simplify `Producer` trait by removing `min_len` and `max_len`.

We're not really ready to commit to this interface yet -- not even remotely. Anyhow, if people can override the scheduler, they can already do (in principle) everything they need to do to customize the runtime, though it may involve some copy-and-paste.

nikomatsakis · 2017-06-10T11:22:23Z

Rebased and fixed the broken tests. (Well, let's see what travis has to say.)

xiaoniu-578fa6bff964d005 · 2020-03-08T19:08:43Z

Yet another custom scheduler (branch and diff). The design is not finalized and we wish to get feedback early.

We tried another design which diverges in following ways:

Scheduler is not an associated type in the ParallelIterator trait, but only belongs to WithScheduler, which is created by iter.with_scheduler(S).
WithScheduler forces the bridge of producer and consumer to happen in this part. That require the iterator on the left to work in producer mode and iterator on the right to work in consumer mode.
Distinguish between Scheduler and UnindexedScheduler, since their behaviour are so different.

Some consequences of the design:

Non breaking change (at least for indexed parallel iterator). No need to change the implementation of other existing tools like Map, Zip and MinLen.
User-provided "backend" is supported.
Scheduler can't be applied before zip. So a.in_scheduler().zip(b) and a.zip(b.in_scheduler()) is illegal. That is because after in_scheduler(), iter can only work in consumer mode, but Zip works in producer mode, and iter on it's left can only work in producer mode.
Scheduler can't be applied twice, so there is no "layering" schedulers. The reason is similar to above.

Some minor designs that could change:

Use empty link hack to forbid using iter.with_scheduler(S) further in producer mode. When users do that, the program simply won't compile and output an ugly link error.
Some alternatives:

Use panic!(). Program will compile but panic when bridge happens.
Use more rigours type system to seperate with_producer() and drive() into two different trait. So we can tell whether a ParallelIterator support working in producer mode or consumer mode. This would be a breaking change.

Some obstacle we are facing:

Current ParallelIterator doesen't have with_unindexed_producer() method, so we can't explicitly get the producer. We believe breaking change is needed for support of unindexed scheduler.
Benchmark is hard...
We find that merely using scheduler in a program will affect the performance of other part of the program where just default bridge function is used. We have no idea where this gloabl effect comes from.

Features we wish to implement in the future:

UnindexedScheduler
We believe breaking change is needed to obtain the producer from ParallelIterator.
Thread Affinity
We wish to allow users to assign which core they want to use to for a task and a slice of ParallelIterator.

cuviper · 2020-03-09T21:20:40Z

@xiaoniu-578fa6bff964d005

Benchmark is hard...
We find that merely using scheduler in a program will affect the performance of other part of the program where just default bridge function is used. We have no idea where this gloabl effect comes from.

A lot of performance depends on sufficient inlining -- it could just be that your additional code changed some thresholds in the module that caused LLVM to make different inlining decisions.

Apart from raw performance, you could also try to evaluate your changes at a higher level, like @wagnerf42 has been doing with his rayon-adaptive. He has logging infrastructure that can show exactly how the jobs were divided, so differences can be compared and justified.
(Those API changes are more invasive than what you seem to be suggesting.)

cuviper · 2020-03-12T17:58:46Z

@xiaoniu-578fa6bff964d005

Current ParallelIterator doesen't have with_unindexed_producer() method, so we can't explicitly get the producer. We believe breaking change is needed for support of unindexed scheduler.

Perhaps a breaking change can be avoided by adding a new trait for scheduling purposes.

Anyway, I'd suggest opening your own issue or pull request to discuss your design further.

nikomatsakis · 2020-06-12T20:12:10Z

I'm going to close this PR as being obviously stalled and outdated.

nikomatsakis mentioned this pull request May 31, 2017

Use an user-provided "backend" (threadpool, event loop, ...) #93

Open

nikomatsakis added 6 commits June 10, 2017 07:11

remove Runtime from scheduler

550a0ee

We're not really ready to commit to this interface yet -- not even remotely. Anyhow, if people can override the scheduler, they can already do (in principle) everything they need to do to customize the runtime, though it may involve some copy-and-paste.

remove stats macros that were no longer in use

28df021

squash warnings

4da8110

fix accidental infinite recursion

7958976

nikomatsakis force-pushed the custom-scheduler branch from 9a28faa to 7958976 Compare June 10, 2017 11:11

fix for rustc 1.12.0

c34b641

xiaoniu-578fa6bff964d005 mentioned this pull request Mar 12, 2020

Yet Another Custom Scheduler #735

Open

nikomatsakis closed this Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce custom schedulers for parallel iterators #353

introduce custom schedulers for parallel iterators #353

nikomatsakis commented May 31, 2017

nikomatsakis commented May 31, 2017

nikomatsakis commented Jun 10, 2017

xiaoniu-578fa6bff964d005 commented Mar 8, 2020 •

edited

Loading

cuviper commented Mar 9, 2020

cuviper commented Mar 12, 2020

nikomatsakis commented Jun 12, 2020

introduce custom schedulers for parallel iterators #353

introduce custom schedulers for parallel iterators #353

Conversation

nikomatsakis commented May 31, 2017

nikomatsakis commented May 31, 2017

nikomatsakis commented Jun 10, 2017

xiaoniu-578fa6bff964d005 commented Mar 8, 2020 • edited Loading

cuviper commented Mar 9, 2020

cuviper commented Mar 12, 2020

nikomatsakis commented Jun 12, 2020

xiaoniu-578fa6bff964d005 commented Mar 8, 2020 •

edited

Loading