-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a garbage collector for tags #2479
Conversation
d29e1a6
to
65f79f1
Compare
Wow, those are some very nice wins! The ideas I have for the next-gen aliasing model pretty much require a GC, so I am very happy to see that this is feasible. :) |
☔ The latest upstream changes (presumably #2500) made this pull request unmergeable. Please resolve the merge conflicts. |
f8f4ca9
to
d71ddb4
Compare
☔ The latest upstream changes (presumably #2363) made this pull request unmergeable. Please resolve the merge conflicts. |
903d0ba
to
0dde6f3
Compare
the PR itself lgtm (at least I remember reviewing it before and marking everything as read XD, the latest changes didn't change that opinion) |
Yeah I'm just keeping the branch rebased up so that it continues to work. |
Also pre-PR, Ralf had some feedback on the maintainability of the code which finds provenance that is stored in the interpreter runtime so I'll at least wait until he's back to merge this. |
Well that certainly made CI times shorter. I'll see if I can put in something that can reassure me that the GC is actually running... |
.github/workflows/ci.yml
Outdated
@@ -25,12 +25,15 @@ jobs: | |||
- build: linux64 | |||
os: ubuntu-latest | |||
host_target: x86_64-unknown-linux-gnu | |||
env: MIRIFLAGS=-Zmiri-tag-gc=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's weird that this made the linux runner faster, too. It looks like it is actually better to run the GC after every basic block instead of after every 10k blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, that was a combination of me doing things wrong and CI timing noise.
This is a mark-and-sweep GC for a resource which sometimes disposes of itself so we benefit doubly from not running the GC too often. There is fixed cost to the mark stage but the amount we sweep away grows if we wait. Plus if an allocation is allocated then deallocated in the time between the GC runs, we never have to interact with it at all.
I did consider trying to hack up some kind of generation system, but between just ignoring small stacks and stacks not modified since the last GC run, the overhead of the sweep part is quite small at the default GC interval.
I'm reasonably convinced that I have (finally) managed to only adjust the GC interval to run more often in CI for Linux. |
@bors r+ |
@@ -34,6 +34,10 @@ jobs: | |||
steps: | |||
- uses: actions/checkout@v3 | |||
|
|||
- name: Set the tag GC interval to 1 on linux | |||
if: runner.os == 'macOS' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment refers to linux, but the condition to macOS, slightly confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 perfectly fine content for a second PR
☀️ Test successful - checks-actions |
This seems like a change that could have waited until after my vacation... Oh well.
|
pub fn remove_unreachable_tags(&mut self, live_tags: &FxHashSet<SbTag>) { | ||
if self.modified_since_last_gc { | ||
for stack in self.stacks.iter_mut_all() { | ||
if stack.len() > 64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha this should definitely be explained in a comment. This is a magic number guesstimated by benchmarking, like the stack-cache length. Skipping over small stacks is a very crude generational garbage collector. I'll definitely make a PR that addresses this later in the day.
let should_keep = match this.perm() { | ||
// SharedReadWrite is the simplest case, if it's unreachable we can just remove it. | ||
Permission::SharedReadWrite => tags.contains(&this.tag()), | ||
// Only retain a Disabled tag if it is terminating a SharedReadWrite block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is special, we delete them even if they are live! (That's correct of course but I had to read this twice to realize that this match
mixes 2 concerns: checking liveness and preventing SRW block merging)
self.borrows.truncate(write_idx); | ||
|
||
#[cfg(not(feature = "stack-cache"))] | ||
drop(first_removed); // This is only needed for the stack-cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_removed
has a Copy
type so I don't think this does anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fighting with the fact that this is unused if the stack-cache is off here. The drop
just makes dead code detection be quiet. Is there a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should mention that we don't have to repair the stack cache. Just tossing the whole thing is a valid approach (it warms up quickly and GC runs are intentionally rare), but I feel like justifying it is awkward. But maybe this issue tips the scales?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The drop just makes dead code detection be quiet. Is there a better approach?
Ah I see. I would use let _unused = ...
for that.
for thread in this.machine.threads.iter() { | ||
if let Some(Scalar::Ptr( | ||
Pointer { provenance: Provenance::Concrete { sb, .. }, .. }, | ||
_, | ||
)) = thread.panic_payload | ||
{ | ||
tags.insert(sb); | ||
} | ||
} | ||
|
||
self.find_tags_in_tls(&mut tags); | ||
self.find_tags_in_memory(&mut tags); | ||
self.find_tags_in_locals(&mut tags)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be factored to use a more general "visit all the values that exist in the machine" kind of operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea but I don't know how to implement it.
The general approach here is to scan TLS, all locals, and the main memory map for all provenance, accumulating a
HashSet
of all pointer tags which are stored anywhere (we also have a special case for panic payloads). Then we iterate over every borrow stack and remove tags which are not in saidHashSet
, or which could be terminating a SRW block.Runtime of benchmarks decreases by between 17% and 81%.
GC off:
GC on:
But much more importantly for me this drops the peak memory usage of the first 1 minute of running
regex
's tests from 103 GB to 1.7 GB.Thanks to @oli-obk for suggesting a while ago that this was possible and @Darksonn for reminding me that we can just search through memory to find Provenance to locate pointers.
Fixes #1367