(experiment) Fine grain locking #16089

ranger-ross · 2025-10-11T10:06:05Z

What does this PR try to resolve?

This is an experiment at adding fine grain locking (at a build unit level) during compilation.
With #15947 merged, this unblocks us to start experimenting with more granular locking tracked in #4282

The primary goal of this PR is to evaluate locking schemes and review their trades offs (i.e. performance, complexity, etc)

Implementation approach / details

The approach is to add a lock file to each build unit dir (build-dir/<profile>/build/<pkg>/<hash>/lock) and acquire an exclusive lock during the compilation of that unit as well as a shared lock of all of its dependencies. These locks are taken using std::fs::File::{lock, lock_shared}.

For this experiment, I found it easier to create the locking from scratch rather than re-using the using locking systems in Filesystem and CacheLocker as their interfaces require gctx which is out of scope during the actual compilation phase passed to Work::new(). (and plumbing gctx into it, while possible was a bit annoying due to lifetime issues)

I encapsulated all of the locking logic into CompilationLock in locking.rs.

Note: For now I simply reused the -Zbuild-dir-new-layout flag to enable fine grain locking, though we may want a stand alone flag for this in the future.

Benchmarking and experimenting

After verifying that the compilation functionality is working, I did some basic benchmarks with hyperfine on a test crate with about ~200 total dependencies to represent a basic small to medium sized crate. Bench marks were run on a Fedora linux x86 machine with a 20 core CPU.

Cargo.toml

[dependencies]
clap = { version = "4.5.48", features = ["derive"] }
syn = "2.0.106"
tokio = { version = "1", features = ["full"]}
actix-web = "4"

(I didn't a lot of thought into the specific dependencies. I simply grabbed some crates a new that had a good amount of transitive dependencies so I did not need at a lot of dependencies manually.)

Results:

> hyperfine --runs 10 --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo build' --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build'
Benchmark 1: /home/ross/projects/cargo/target/release/cargo build
  Time (mean ± σ):      9.997 s ±  0.078 s    [User: 78.805 s, System: 12.906 s]
  Range (min … max):    9.888 s … 10.122 s    10 runs

Benchmark 2: /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build
  Time (mean ± σ):     10.940 s ±  0.167 s    [User: 76.551 s, System: 12.809 s]
  Range (min … max):   10.652 s … 11.157 s    10 runs

Summary
  /home/ross/projects/cargo/target/release/cargo build ran
    1.09 ± 0.02 times faster than /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build

From the results above we can see we are taking nearly a ~10% performance hit due to the locking overhead. Which is quiet bad IMO...

Out of curiosity, I also tried taking the shared locks in parallel using rayon's .par_iter() to see if that would improve the situation.

Code Change

// src/cargo/core/compiler/locking.rs
        let dependency_locks = self
            .dependency_units
            .par_iter() // <------- CHANGED THIS
            .map(|d| {
                let f = OpenOptions::new()
                    .create(true)
                    .write(true)
                    .append(true)
                    .open(d)
                    .unwrap();
                f.lock_shared().unwrap();
                f
            })
            .collect::<Vec<_>>();

> hyperfine --runs 10 --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo build' --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build'
Benchmark 1: /home/ross/projects/cargo/target/release/cargo build
  Time (mean ± σ):     10.065 s ±  0.084 s    [User: 78.569 s, System: 12.987 s]
  Range (min … max):    9.945 s … 10.215 s    10 runs

Benchmark 2: /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build
  Time (mean ± σ):     10.904 s ±  0.100 s    [User: 75.767 s, System: 12.876 s]
  Range (min … max):   10.758 s … 11.068 s    10 runs

Summary
  /home/ross/projects/cargo/target/release/cargo build ran
    1.08 ± 0.01 times faster than /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build

However we can see this did really improve it by much if at all.

Another idea I had was to see if taking a lock on the build unit directory (build-dir/<profile>/build/<pkg>/<hash>) directly instead of writing a dedicated lock file would have any effect. However, this also had minimal if any improvement compared to using a standalone file.

> hyperfine --runs 10 --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo build' --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build'
Benchmark 1: /home/ross/projects/cargo/target/release/cargo build
  Time (mean ± σ):     10.082 s ±  0.055 s    [User: 78.192 s, System: 12.938 s]
  Range (min … max):    9.984 s … 10.183 s    10 runs

Benchmark 2: /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build
  Time (mean ± σ):     10.829 s ±  0.104 s    [User: 76.385 s, System: 12.765 s]
  Range (min … max):   10.613 s … 10.987 s    10 runs

Summary
  /home/ross/projects/cargo/target/release/cargo build ran
    1.07 ± 0.01 times faster than /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build

I also benchmarked with a larger project with about ~750 dependencies to see how the changes scale with large projects.
Note: This is without rayon and using the lock file setup from the first benchmark above.

Cargo.toml

[dependencies]
clap = { version = "4.5.48", features = ["derive"] }
syn = "2.0.106"
tokio = { version = "1", features = ["full"]}
actix-web = "4"
axum = "0.8"
ratatui = "0.29"
aws-sdk-s3 = "1"
aws-sdk-dynamodb = "1"
serde = { version = "1", features = ["derive"] }
rand = "0.9"
sqlx = { version = "0.8", features = ["runtime-tokio-rustls", "postgres", "mysql", "macros"] }
bevy = "0.17"

> hyperfine --runs 10 --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo build' --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build'
Benchmark 1: /home/ross/projects/cargo/target/release/cargo build
  Time (mean ± σ):     63.624 s ±  0.895 s    [User: 645.249 s, System: 77.388 s]
  Range (min … max):   62.818 s … 65.855 s    10 runs

Benchmark 2: /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build
  Time (mean ± σ):     70.956 s ±  0.546 s    [User: 563.547 s, System: 69.584 s]
  Range (min … max):   70.090 s … 71.517 s    10 runs

Summary
  /home/ross/projects/cargo/target/release/cargo build ran
    1.12 ± 0.02 times faster than /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build

Other observations

The penalty appears to scale with project size. For projects with less than 30 dependencies, the penalty was generally less than 1%. Also it seemingly flattening out around a 10%-15% penalty.

I also ran a baseline to make sure the performance loss was not coming from layout restructuring (as opposed to adding locking) by running the same bench with out the locking changes. (built from commit 81c3f77)

> hyperfine --runs 10 --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo build' --prepare 'rm -rf target' '/home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build'
Benchmark 1: /home/ross/projects/cargo/target/release/cargo build
  Time (mean ± σ):      9.522 s ±  0.099 s    [User: 73.558 s, System: 11.183 s]
  Range (min … max):    9.332 s …  9.676 s    10 runs

Benchmark 2: /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build
  Time (mean ± σ):      9.489 s ±  0.104 s    [User: 73.694 s, System: 11.129 s]
  Range (min … max):    9.291 s …  9.668 s    10 runs

Summary
  /home/ross/projects/cargo/target/release/cargo -Zbuild-dir-new-layout build ran
    1.00 ± 0.02 times faster than /home/ross/projects/cargo/target/release/cargo build

ranger-ross · 2025-10-11T11:15:37Z

After some more digging, I think a large part of the performance regression here is due to the locking causing jobs to wait for both rmeta AND rlibs to be generated before proceeding.

Below is a trace view to illustrate:

The lock span is the time waiting for a job to acquire the locks it needs to proceed.

We can as soon as the .rmeta is produces the job queue will allow the next job to run, but since the exclusive lock is not released until the crate is fully compiled the next job waits because it cannot get a shared lock.

We may need to create a more complicated locking mechanism similar to the crate cache that would allow us to downgrade the to a shared lock or have dedicated lock states like rmeta_produced

ehuss · 2025-10-11T12:09:16Z

How do you plan to handle deadlocks?

EDIT: Though thinking more... Probably not an issue. I was thinking of cycles, but maybe dev-dep cycles will have a different hash?

ranger-ross · 2025-10-11T13:26:19Z

How do you plan to handle deadlocks?

EDIT: Though thinking more... Probably not an issue. I was thinking of cycles, but maybe dev-dep cycles will have a different hash?

Yeah, my assumption is that there would be no cycles in the unit graph, so if unit is scheduled to run all of it's dependencies have already been built and their locks had been released.

ranger-ross · 2025-10-16T11:25:53Z

I reverted the multiple locks per build unit approach for now.
I posted a comment on the tracking issue with some design proposals, but we still have not fleshed out the direction we want to go with this.

Plan to discuss more about the path forward in the next Cargo team meeting.

rustbot added A-build-execution Area: anything dealing with executing the compiler A-layout Area: target output directory layout, naming, and organization labels Oct 11, 2025

ranger-ross mentioned this pull request Oct 11, 2025

More granular locking in cargo_rustc #4282

Open

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Oct 13, 2025

ranger-ross force-pushed the experiment-with-fine-grain-locking branch from 64e5bf3 to f221f5e Compare October 14, 2025 14:50

feat(locking): Added build unit level locking

3fc0143

ranger-ross force-pushed the experiment-with-fine-grain-locking branch from f221f5e to 3fc0143 Compare October 16, 2025 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(experiment) Fine grain locking #16089

(experiment) Fine grain locking #16089

ranger-ross commented Oct 11, 2025 •

edited by epage

Loading

Uh oh!

ranger-ross commented Oct 11, 2025 •

edited

Loading

Uh oh!

ehuss commented Oct 11, 2025 •

edited

Loading

Uh oh!

ranger-ross commented Oct 11, 2025

Uh oh!

This comment has been minimized.

ranger-ross commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

(experiment) Fine grain locking #16089

Are you sure you want to change the base?

(experiment) Fine grain locking #16089

Conversation

ranger-ross commented Oct 11, 2025 • edited by epage Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR try to resolve?

Implementation approach / details

Benchmarking and experimenting

Other observations

Uh oh!

ranger-ross commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehuss commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranger-ross commented Oct 11, 2025

Uh oh!

This comment has been minimized.

ranger-ross commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ranger-ross commented Oct 11, 2025 •

edited by epage

Loading

ranger-ross commented Oct 11, 2025 •

edited

Loading

ehuss commented Oct 11, 2025 •

edited

Loading