Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a garbage collector for tags #2479

Merged
merged 2 commits into from
Sep 13, 2022
Merged

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Aug 11, 2022

The general approach here is to scan TLS, all locals, and the main memory map for all provenance, accumulating a HashSet of all pointer tags which are stored anywhere (we also have a special case for panic payloads). Then we iterate over every borrow stack and remove tags which are not in said HashSet, or which could be terminating a SRW block.

Runtime of benchmarks decreases by between 17% and 81%.

GC off:

Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      7.080 s ±  0.249 s    [User: 6.870 s, System: 0.202 s]
  Range (min … max):    6.933 s …  7.521 s    5 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):      1.875 s ±  0.031 s    [User: 1.630 s, System: 0.245 s]
  Range (min … max):    1.825 s …  1.910 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      2.785 s ±  0.075 s    [User: 2.536 s, System: 0.168 s]
  Range (min … max):    2.698 s …  2.851 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      6.267 s ±  0.066 s    [User: 6.072 s, System: 0.190 s]
  Range (min … max):    6.152 s …  6.314 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):      4.733 s ±  0.080 s    [User: 4.177 s, System: 0.513 s]
  Range (min … max):    4.681 s …  4.874 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      3.770 s ±  0.034 s    [User: 3.549 s, System: 0.211 s]
  Range (min … max):    3.724 s …  3.819 s    5 runs

GC on:

Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      5.886 s ±  0.054 s    [User: 5.696 s, System: 0.182 s]
  Range (min … max):    5.799 s …  5.937 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):     936.4 ms ±   7.0 ms    [User: 815.4 ms, System: 119.6 ms]
  Range (min … max):   925.7 ms … 945.0 ms    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      2.126 s ±  0.022 s    [User: 1.979 s, System: 0.146 s]
  Range (min … max):    2.089 s …  2.143 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      4.242 s ±  0.066 s    [User: 4.051 s, System: 0.160 s]
  Range (min … max):    4.196 s …  4.357 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):     907.4 ms ±   2.4 ms    [User: 788.6 ms, System: 118.2 ms]
  Range (min … max):   903.5 ms … 909.4 ms    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      1.821 s ±  0.011 s    [User: 1.687 s, System: 0.133 s]
  Range (min … max):    1.802 s …  1.831 s    5 runs

But much more importantly for me this drops the peak memory usage of the first 1 minute of running regex's tests from 103 GB to 1.7 GB.

Thanks to @oli-obk for suggesting a while ago that this was possible and @Darksonn for reminding me that we can just search through memory to find Provenance to locate pointers.

Fixes #1367

@saethlin saethlin marked this pull request as draft August 11, 2022 01:18
@saethlin saethlin force-pushed the tag-gc branch 2 times, most recently from d29e1a6 to 65f79f1 Compare August 16, 2022 12:59
@RalfJung
Copy link
Member

Wow, those are some very nice wins!

The ideas I have for the next-gen aliasing model pretty much require a GC, so I am very happy to see that this is feasible. :)

@bors
Copy link
Contributor

bors commented Aug 21, 2022

☔ The latest upstream changes (presumably #2500) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung RalfJung added the S-draft Status: still a draft, not yet ready for review label Aug 22, 2022
@saethlin saethlin force-pushed the tag-gc branch 2 times, most recently from f8f4ca9 to d71ddb4 Compare August 22, 2022 21:47
@bors
Copy link
Contributor

bors commented Aug 26, 2022

☔ The latest upstream changes (presumably #2363) made this pull request unmergeable. Please resolve the merge conflicts.

src/machine.rs Outdated Show resolved Hide resolved
@saethlin saethlin changed the title WIP: -Zmiri-tag-gc, a garbage collector for tags Implement a garbage collector for tags Aug 27, 2022
@saethlin saethlin marked this pull request as ready for review August 27, 2022 01:13
@saethlin saethlin added S-waiting-on-review Status: Waiting for a review to complete and removed S-draft Status: still a draft, not yet ready for review labels Aug 27, 2022
@saethlin saethlin force-pushed the tag-gc branch 2 times, most recently from 903d0ba to 0dde6f3 Compare September 5, 2022 18:47
@oli-obk oli-obk self-assigned this Sep 5, 2022
@oli-obk
Copy link
Contributor

oli-obk commented Sep 5, 2022

the PR itself lgtm (at least I remember reviewing it before and marking everything as read XD, the latest changes didn't change that opinion)

@saethlin
Copy link
Member Author

saethlin commented Sep 5, 2022

Yeah I'm just keeping the branch rebased up so that it continues to work.

@saethlin
Copy link
Member Author

saethlin commented Sep 5, 2022

Also pre-PR, Ralf had some feedback on the maintainability of the code which finds provenance that is stored in the interpreter runtime so I'll at least wait until he's back to merge this.

@saethlin
Copy link
Member Author

saethlin commented Sep 6, 2022

Well that certainly made CI times shorter. I'll see if I can put in something that can reassure me that the GC is actually running...

@@ -25,12 +25,15 @@ jobs:
- build: linux64
os: ubuntu-latest
host_target: x86_64-unknown-linux-gnu
env: MIRIFLAGS=-Zmiri-tag-gc=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird that this made the linux runner faster, too. It looks like it is actually better to run the GC after every basic block instead of after every 10k blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, that was a combination of me doing things wrong and CI timing noise.

This is a mark-and-sweep GC for a resource which sometimes disposes of itself so we benefit doubly from not running the GC too often. There is fixed cost to the mark stage but the amount we sweep away grows if we wait. Plus if an allocation is allocated then deallocated in the time between the GC runs, we never have to interact with it at all.

I did consider trying to hack up some kind of generation system, but between just ignoring small stacks and stacks not modified since the last GC run, the overhead of the sweep part is quite small at the default GC interval.

@saethlin saethlin added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Sep 10, 2022
@saethlin saethlin removed the S-waiting-on-author Status: Waiting for the PR author to address review comments label Sep 11, 2022
@saethlin saethlin added the S-waiting-on-review Status: Waiting for a review to complete label Sep 11, 2022
@saethlin
Copy link
Member Author

I'm reasonably convinced that I have (finally) managed to only adjust the GC interval to run more often in CI for Linux.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 13, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Sep 13, 2022

📌 Commit f59605c has been approved by oli-obk

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Sep 13, 2022

⌛ Testing commit f59605c with merge a00fa96...

@@ -34,6 +34,10 @@ jobs:
steps:
- uses: actions/checkout@v3

- name: Set the tag GC interval to 1 on linux
if: runner.os == 'macOS'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment refers to linux, but the condition to macOS, slightly confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 perfectly fine content for a second PR

@bors
Copy link
Contributor

bors commented Sep 13, 2022

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing a00fa96 to master...

@bors bors merged commit a00fa96 into rust-lang:master Sep 13, 2022
@RalfJung
Copy link
Member

RalfJung commented Sep 13, 2022 via email

pub fn remove_unreachable_tags(&mut self, live_tags: &FxHashSet<SbTag>) {
if self.modified_since_last_gc {
for stack in self.stacks.iter_mut_all() {
if stack.len() > 64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha this should definitely be explained in a comment. This is a magic number guesstimated by benchmarking, like the stack-cache length. Skipping over small stacks is a very crude generational garbage collector. I'll definitely make a PR that addresses this later in the day.

let should_keep = match this.perm() {
// SharedReadWrite is the simplest case, if it's unreachable we can just remove it.
Permission::SharedReadWrite => tags.contains(&this.tag()),
// Only retain a Disabled tag if it is terminating a SharedReadWrite block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is special, we delete them even if they are live! (That's correct of course but I had to read this twice to realize that this match mixes 2 concerns: checking liveness and preventing SRW block merging)

self.borrows.truncate(write_idx);

#[cfg(not(feature = "stack-cache"))]
drop(first_removed); // This is only needed for the stack-cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_removed has a Copy type so I don't think this does anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fighting with the fact that this is unused if the stack-cache is off here. The drop just makes dead code detection be quiet. Is there a better approach?

Copy link
Member Author

@saethlin saethlin Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should mention that we don't have to repair the stack cache. Just tossing the whole thing is a valid approach (it warms up quickly and GC runs are intentionally rare), but I feel like justifying it is awkward. But maybe this issue tips the scales?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drop just makes dead code detection be quiet. Is there a better approach?

Ah I see. I would use let _unused = ... for that.

Comment on lines +15 to +27
for thread in this.machine.threads.iter() {
if let Some(Scalar::Ptr(
Pointer { provenance: Provenance::Concrete { sb, .. }, .. },
_,
)) = thread.panic_payload
{
tags.insert(sb);
}
}

self.find_tags_in_tls(&mut tags);
self.find_tags_in_memory(&mut tags);
self.find_tags_in_locals(&mut tags)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be factored to use a more general "visit all the values that exist in the machine" kind of operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea but I don't know how to implement it.

@saethlin saethlin deleted the tag-gc branch January 15, 2023 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stacked borrows analysis is super-linear (in time and space)
4 participants