Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea for further improvement of build scheduling #5125

Closed
matthiaskrgr opened this issue Mar 5, 2018 · 7 comments
Closed

idea for further improvement of build scheduling #5125

matthiaskrgr opened this issue Mar 5, 2018 · 7 comments
Labels
Performance Gotta go fast!

Comments

@matthiaskrgr
Copy link
Member

matthiaskrgr commented Mar 5, 2018

I'm trying to understand #5100
Lets assume we have a dependency tree like this

      ROOT CRATE
    /   |   |   \
   A    B   C    D
 / |    |
E  F    G
|       | \
H       I  J
|          |
K          L
           |
           M

depth annotation would be something like this?

     1ROOT CRATE
    /   |   |   \
  2A   2B   2C  2D
 / |    |
3E 3F  3G
|       | \
4H     4I  4J
|          |
5K        5L
           |
          6M

as far as I understand, build order will be like this:

     14ROOT CRATE
    /   |   |   \
  10A  11B 12C  13D
 / |    |
7E 8F  9G
|       | \
4H     5I  6J
|          |
2K        3L
           |
          1M

It might be useful to not only take depth but also the number of child nodes (in braces) into account, like this:

         1(14)ROOT CRATE--,
        /    |     \     `\
   ,-2(5)A  2(6)B  2(1)C  2(1)D
  /      |    |
3(3)E 3(1)F  3(5)G-,
  |          |      \
4(2)H      4(1)I  4(3)J
  |                 |
5(1)K             5(2)L
                    |
                  6(1)M

If we have several crates of equal depth, we could look at the number of child nodes and prioritize the crate with the most (total) child nodes (assuming that crates with a lot of dependencies are bigger that those with only few dependencies).
New build order:

     14ROOT CRATE
    /   |   |   \
  11A  10B 12C  13D
 / |    |
8E 9F  7G
|       | \
5H     6I  4J
|          |
2K        3L
           |
          1M

Does this sound resonable?

edit: aid for debugging RUST_LOG=cargo::core::compiler::job_queue cargo check

@matthiaskrgr matthiaskrgr changed the title ìdea for further improvement of build scheduling idea for further improvement of build scheduling Mar 5, 2018
@alexcrichton
Copy link
Member

Certainly! Our goal is to always keep all the cpus busy, so anything that optimizes work to get opened up I think looks great as an ordering.

@fabricedesre
Copy link
Contributor

I am totally unfamiliar with the dependency scheduling in cargo, but is a "static" order the best way? Shouldn't we re-compute which crate to build every time we are done with one and a cpu is free?
Naively, we would start with leaf nodes and dynamically move on as the build progresses.

@alexcrichton
Copy link
Member

@fabricedesre yeah that's basically what happens today. The question here is how to prioritize work when there's multiple candidates to be scheduled, the code for the current iteration is at https://github.com/rust-lang/cargo/blob/master/src/cargo/util/dependency_queue.rs

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 20, 2019

So #7390 changed the build order, but not in a way that will affect the OP. #7390 fixes the case

  d
 / \ \ \
c   e f g
|   \ | /
b     I
|
a

Where finishing I freeze up opportunities for parallelism. In practise this proitatizes winapi and syn and so speeds things up. But that is not necessarily true, some other project could be bottleneck on the a->c pilpline instead.

@est31
Copy link
Member

est31 commented Sep 21, 2019

To extend my comment from #7396:

The problem that cargo has to solve is known as the multiprocessor DAG scheduling problem, which is NP complete so we can only apply approximations. The good news is that the problem is well-known in the literature and you can find a wealth of papers about the topic. E.g. this one is fairly recent, or this one.

I suppose it would be a good idea to somehow account for the projected build time of a certain crate. Those are dependent from the feature flags as well as the platform, winapi is quite trivial to build on linux while it takes longer on windows. Issue #7396 is exploring this idea a bit.

A project to find a better scheduling algorithm is a great opportunity for experimentation. One could e.g. parse crater logs to find out how long each crate takes to build and then use that information to build a "virtual" crater where the results of different scheduling algorithms are compared using different CPU numbers etc. Then we'd find the scheduling algorithm that fits the cargo use case best. The final implementation of the scheduling algorithm could be even put into its own crate, for use in the ecosystem, similar to polonius.

@ehuss ehuss added the Performance Gotta go fast! label Sep 21, 2019
@epage
Copy link
Contributor

epage commented May 24, 2023

#11032 made us take priority into account in more cases.

As part of the experiments for that, some other priority tweaks were looked into. See the zulip thread

@epage
Copy link
Contributor

epage commented Oct 18, 2023

Reading more on #11032, it sounds like this is fully implemented

From:

There's already a notion of cost per unit of work in cargo: it's synthethic (always 100) but is used to compute a priority (as the cost of a unit of work and all the units that depend on it). This cost/priority is taken into account when a job is now ready to be built and dequeued from the dependency queue: the highest priority job is dequeued, and transferred to the pending queue.

So the pending queue will contain jobs with different costs, but doesn't use it to choose the next job to build. This experiment uses the cost/priority to sort the pending queue, so that the highest cost/priority jobs are picked first. The hope is that this makes bigger dependency subgraphs processed sooner, and brings more parallelism opportunities are earlier in the pipeline.

If there is something I'm missing, let us know!

@epage epage closed this as completed Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Gotta go fast!
Projects
None yet
Development

No branches or pull requests

7 participants