Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce custom schedulers for parallel iterators #353

Closed
wants to merge 7 commits into from

Conversation

nikomatsakis
Copy link
Member

This PR introduces the notion of "custom schedulers". The idea is that you can invoke, for any parallel iterator iter, iter.in_scheduler(S), and it will run with the scheduler S. The scheduler basically takes the place of the (currently hard-coded) "bridge" functions -- it connects a parallel producer and a parallel consumer.

We introduce a Scheduler associated type into the ParallelIterator trait. This is used to ensure that the user cannot invoke in_scheduler() twice on any given parallel iterator. For implicity, we also forbid customized schedulers from the "inputs" where mutiple parallel iterators are "joined" (like zip() or chain()). Hence a.in_scheduler().zip(b) is illegal, but a.zip(b).in_scheduler() is fine. We could alter these rules to permit both (but we want to avoid a.in_scheduler().zip(b.in_scheduler())).

In this PR, Rayon exposes three schedulers:

  • DefaultScheduler -- what you get if you don't say anything; also encodes the fact that in_scheduler has never been called. Currently, this delegates to AdaptiveScheduler.
  • AdaptiveScheduler -- the default strategy of "split to depth N, and more if stolen"
  • FixedLenScheduler -- like above, but where we adjust N to try and respect the user's min/max requests

Originally, the idea here was to allow you to swap out the Rayon runtime completely. In fact, you can still see this in the Runtime trait introduced by the first commit, though I've removed it by the end -- basically because I don't feel we're ready to commit to this trait yet (and in principle no flexibility is lost). I imagine we'll probably add this back at some point in some form.

Some unknowns:

  • Module structure. I feel like we should move most of these things to iter::scheduler.
  • What should be done with the Scheduler associated type in ParallelIterator?
    • It's kind of weird right now. It sounds like it specifies the scheduler, but really it's always one of two values: DefaultScheduler or CustomScheduler. This is just used to prevent multiple in_scheduler() calls. I think I'd rather just give a semantics to multiple calls in some useful way. This interacts with the next question.
  • Can we support "layering" schedulers?
    • Right now, we prohibit multiple calls to in_scheduler, but I was thinking that it'd also be possible to permit "combining" schedulers (e.g., with_min and with_max). I have to play around with this a bit more.
  • What to do with chain() and other ad-hoc calls to join()?
    • @cuviper recently modified the drive() impl of chain() so that it will drive in parallel, using join(). This kind of goes against the grain of this PR, so we probably want to either extend the scheduler interface with a join method or reframe that as a parallel producer consumer, or something like that.

cc #93 -- this permits a user-provided "backend", though I think ultimately more flexibility might be desired. (This only works for parallel iterators, for one thing.)

@nikomatsakis
Copy link
Member Author

OK, I see I was not running all the tests and some broke. Bah. Well, I've got to run now, but I'll get them working.

The scheduler gets threaded down through the calls to `drive()` and
`with_producer()`. It replaces the existing "bridge" functions. This
version includes two schedulers -- the adaptive scheduler and the
fixed-length scheduler.

(Note that the existing `min_len()` and `max_len()` combinators are
ignored by this commit; they will be replaced by uses of
`in_scheduler()` in the next commit.)
This is a breaking change because `iter.with_min(N).with_max(X)` must
now be rewritten into `iter.with_min_max(N, X)`. This is because we do
not permit more than one `in_scheduler()` call on any particular
parallel iterator chain.

Also, simplify `Producer` trait by removing `min_len` and `max_len`.
We're not really ready to commit to this interface yet -- not even
remotely. Anyhow, if people can override the scheduler, they can already
do (in principle) everything they need to do to customize the runtime,
though it may involve some copy-and-paste.
@nikomatsakis
Copy link
Member Author

Rebased and fixed the broken tests. (Well, let's see what travis has to say.)

@xiaoniu-578fa6bff964d005
Copy link

xiaoniu-578fa6bff964d005 commented Mar 8, 2020

Yet another custom scheduler (branch and diff). The design is not finalized and we wish to get feedback early.

We tried another design which diverges in following ways:

  • Scheduler is not an associated type in the ParallelIterator trait, but only belongs to WithScheduler, which is created by iter.with_scheduler(S).
  • WithScheduler forces the bridge of producer and consumer to happen in this part. That require the iterator on the left to work in producer mode and iterator on the right to work in consumer mode.
  • Distinguish between Scheduler and UnindexedScheduler, since their behaviour are so different.

Some consequences of the design:

  • Non breaking change (at least for indexed parallel iterator). No need to change the implementation of other existing tools like Map, Zip and MinLen.
  • User-provided "backend" is supported.
  • Scheduler can't be applied before zip. So a.in_scheduler().zip(b) and a.zip(b.in_scheduler()) is illegal. That is because after in_scheduler(), iter can only work in consumer mode, but Zip works in producer mode, and iter on it's left can only work in producer mode.
  • Scheduler can't be applied twice, so there is no "layering" schedulers. The reason is similar to above.

Some minor designs that could change:

  • Use empty link hack to forbid using iter.with_scheduler(S) further in producer mode. When users do that, the program simply won't compile and output an ugly link error.
    Some alternatives:
  1. Use panic!(). Program will compile but panic when bridge happens.
  2. Use more rigours type system to seperate with_producer() and drive() into two different trait. So we can tell whether a ParallelIterator support working in producer mode or consumer mode. This would be a breaking change.

Some obstacle we are facing:

  • Current ParallelIterator doesen't have with_unindexed_producer() method, so we can't explicitly get the producer. We believe breaking change is needed for support of unindexed scheduler.
  • Benchmark is hard...
    We find that merely using scheduler in a program will affect the performance of other part of the program where just default bridge function is used. We have no idea where this gloabl effect comes from.

Features we wish to implement in the future:

  • UnindexedScheduler
    We believe breaking change is needed to obtain the producer from ParallelIterator.
  • Thread Affinity
    We wish to allow users to assign which core they want to use to for a task and a slice of ParallelIterator.

@cuviper
Copy link
Member

cuviper commented Mar 9, 2020

@xiaoniu-578fa6bff964d005

  • Benchmark is hard...
    We find that merely using scheduler in a program will affect the performance of other part of the program where just default bridge function is used. We have no idea where this gloabl effect comes from.

A lot of performance depends on sufficient inlining -- it could just be that your additional code changed some thresholds in the module that caused LLVM to make different inlining decisions.

Apart from raw performance, you could also try to evaluate your changes at a higher level, like @wagnerf42 has been doing with his rayon-adaptive. He has logging infrastructure that can show exactly how the jobs were divided, so differences can be compared and justified.
(Those API changes are more invasive than what you seem to be suggesting.)

@cuviper
Copy link
Member

cuviper commented Mar 12, 2020

@xiaoniu-578fa6bff964d005

  • Current ParallelIterator doesen't have with_unindexed_producer() method, so we can't explicitly get the producer. We believe breaking change is needed for support of unindexed scheduler.

Perhaps a breaking change can be avoided by adding a new trait for scheduling purposes.

Anyway, I'd suggest opening your own issue or pull request to discuss your design further.

@nikomatsakis
Copy link
Member Author

I'm going to close this PR as being obviously stalled and outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants