Skip to content

Isolate MiriMachine memory from Miri's #4343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nia-e
Copy link
Contributor

@nia-e nia-e commented May 22, 2025

Based on discussion surrounding #4326, this merges in the (very simple) discrete allocator that the MiriMachine will have. See the design document linked there there for considerations, but in brief: we could pull in an off the shelf allocator for this, but performance isn't a massive worry and doing it this way might make it easier to enable support for doing multi-seeded runs in the future (without a lot more extraneous plumbing, at least)

@nia-e
Copy link
Contributor Author

nia-e commented May 22, 2025

@rustbot ready

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label May 22, 2025
@nia-e nia-e force-pushed the discrete-allocator branch 4 times, most recently from 6cbc283 to b53ed38 Compare May 23, 2025 10:39
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left some first comments, but this is not a full review. I'd rather not reverse-engineer the invariants of MachineAlloc myself, so I'll wait for you to document them, which will make review a lot easier.

Furthermore, all pub fn in discrete_alloc should have proper doc comments, not just a safety comment. Please also add some basic unit tests -- we don't use them much in Miri, but this is one of the cases where they would make sense.

On Zulip you mentioned some benchmarks. Can you put benchmark results for the variant that you ended up going for here into the PR?

@RalfJung
Copy link
Member

@rustbot author

@rustbot rustbot removed the S-waiting-on-review Status: Waiting for a review to complete label May 24, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 24, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rustbot rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label May 24, 2025
@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

I'll post benchmarks in a bit! I realised there might be some speed gains to be made with very simple changes, so I'll just experiment a little first. Thanks for the comments ^^

@nia-e nia-e force-pushed the discrete-allocator branch from 4b1f50f to 9f1047e Compare May 24, 2025 15:13
@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

Baseline is set to having the allocator fully disabled. It's only marginally slower in most cases, though it struggles with large allocations it seems. I wonder how much work it would be to improve that, but if we go down the "only machines using this will touch it" path I hope it's not too bad? I got a slight (~4%) improvement from calling mmap directly instead of calling alloc::alloc() but I doubt that's worth it. Is Miri built with optimisations on by default when invoking ./miri bench?

Comparison with baseline (relative speed, lower is better for the new results):
  backtraces: 1.36 ± 0.08
  big-allocs: 32.71 ± 1.60
  mse: 1.05 ± 0.10
  range-iteration: 1.06 ± 0.03
  serde1: 1.00 ± 0.03
  serde2: 1.06 ± 0.03
  slice-chunked: 1.03 ± 0.04
  slice-get-unchecked: 0.93 ± 0.08
  string-replace: 1.06 ± 0.04
  unicode: 1.06 ± 0.05
  zip-equal: 1.08 ± 0.02

@RalfJung
Copy link
Member

Is Miri built with optimisations on by default when invoking ./miri bench?

Yes.

A 32x slowdown with big allocations is hefty.^^ Shouldn't those just forward to alloc::alloc anyway as part of the huge allocs treatment? Without looking at the code, what I'd assume happens when the allocation request is close to the page size is: round up to multiple of page size, and then just allocate that and use it directly without any kind of tracking of "which parts of this page are used" or so. That should have basically no overhead.

@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

It does mostly do that, which is what's confusing me... I'll try to fix it, I assume I just missed something really obvious.

@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

I checked; seems like the big-allocs test specifically calls alloc_zeroed which was the reason for the slowdown (apparently ptr.write_bytes() is a really slow way to zero out bytes). I changed up the logic a bit to make the logic generic over calling alloc::alloc() vs alloc::alloc_zeroed() and these are the (much better!) results:

Comparison with baseline (relative speed, lower is better for the new results):
  backtraces: 1.23 ± 0.06
  big-allocs: 1.03 ± 0.05
  mse: 0.99 ± 0.09
  range-iteration: 1.01 ± 0.03
  serde1: 0.99 ± 0.02
  serde2: 1.03 ± 0.02
  slice-chunked: 0.97 ± 0.04
  slice-get-unchecked: 0.89 ± 0.08
  string-replace: 1.00 ± 0.03
  unicode: 1.01 ± 0.04
  zip-equal: 1.05 ± 0.04

I might be able to squeeze a bit more perf out by actually making the functions generic instead of just passing in a function pointer but shrug, unsure if it's necessary

@RalfJung
Copy link
Member

Ah yes, that is exactly why we added that particular benchmark. :)

@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

What kind of unit tests do you think belong here? I assumed functionality is covered by the usual tests, but I'll happily add in some stuff if you think it's relevant

@RalfJung
Copy link
Member

What kind of unit tests do you think belong here? I assumed functionality is covered by the usual tests, but I'll happily add in some stuff if you think it's relevant

Similar to range_map: just add some functions with #[test] in the file with the allocator implementation to test some particular corner cases -- or at least a basic smoke test if there are no corner cases.

@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

Openen the PR on the main repo, I'll get to adding tests if everything there is okay it is not okay I need to do more things oops

@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

Expecting the build to fail for now since it's adapted to the changes from the PR (but also Miri seems to be having trouble on the current upstream master commit, so I guess it's pending that being fixed too)

@nia-e
Copy link
Contributor Author

nia-e commented May 24, 2025

Tests added :D let me know if there's anything more to do

@nia-e nia-e force-pushed the discrete-allocator branch from 87f2b3f to f7fe286 Compare May 24, 2025 22:52
@rustbot

This comment has been minimized.

@nia-e nia-e force-pushed the discrete-allocator branch 4 times, most recently from c4ad33f to b832def Compare May 25, 2025 16:20
rust-timer added a commit to rust-lang/rust that referenced this pull request May 26, 2025
Rollup merge of #141513 - nia-e:allocbytes-extend, r=RalfJung

interpret: add allocation parameters to `AllocBytes`

Necessary for a better implementation of [rust-lang/miri#4343](rust-lang/miri#4343). Also included here is the code from that PR, adapted to this new interface for the sake of example and so that CI can run on them; the Miri changes can be reverted and merged separately, though.

r? `@RalfJung`
github-actions bot pushed a commit that referenced this pull request May 27, 2025
interpret: add allocation parameters to `AllocBytes`

Necessary for a better implementation of [#4343](#4343). Also included here is the code from that PR, adapted to this new interface for the sake of example and so that CI can run on them; the Miri changes can be reverted and merged separately, though.

r? `@RalfJung`
@rustbot

This comment has been minimized.

@nia-e nia-e force-pushed the discrete-allocator branch from 1c9e998 to a75234c Compare May 27, 2025 14:16
@nia-e
Copy link
Contributor Author

nia-e commented May 27, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels May 27, 2025
@nia-e nia-e force-pushed the discrete-allocator branch 4 times, most recently from 3e177a7 to 3f500d4 Compare May 27, 2025 14:34
@nia-e nia-e force-pushed the discrete-allocator branch from 3f500d4 to 6794b61 Compare May 27, 2025 14:37
use std::alloc::{self, Layout};
use std::sync;

static ALLOCATOR: sync::Mutex<IsolatedAlloc> = sync::Mutex::new(IsolatedAlloc::empty());
Copy link
Member

@RalfJung RalfJung May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we wouldn't need this any more and could instead store this in the Machine and pass it to the allocation via the params... otherwise, the allocations of multiple native-libs machines could still get mixed here.

But that's harder now that parameters are their own associated type. It should still be possible though, AFAIK lifetime generic associated types are stable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think we can do that; I'll get on it

Copy link
Contributor Author

@nia-e nia-e May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something here but simply changing the associated type to something like an Option<&'p std::cell::RefCell<IsolatedAlloc>> means we have to make the MiriMachine a self-referential struct, no? Since GlobalStateInner holds an FxHashMap<AllocId, MiriAllocBytes<'p>> it needs a lifetime, and therefore also GlobalState needs a lifetime...

Maybe instead we can just pass in something like a MachineId in the params and have the allocator keep track of that? That would be quite easy to do, simply have each page store the corresponding MachineId (or equivalently, store a Vec<(Vec<*mut u8>, MachineId)>). That feels like less of a headahce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up going with a hashmap, but I can change that if you think it's not the right call. I just couldn't see a non-intrusive way to do it outside of either (a) passing a raw pointer instead of a reference (yikes) or (b) using something like Arc<RefCell<IsolatedAllocator>> which I'm not sure is worth the overhead and runtime panic risk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've looked into making the ptrace bits compatible with multi-seeded Miri and I'm pretty strongly in favour of this approach because it will also enable us to share one single supervisor process for all of the machines (telling them apart by the seed and propagating that). Though if the C library has shared state across all seeds that will be problematic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I guess now I nerd-sniped you into making the many-seeds thing works for native-libs mode... sorry for that.^^

Just for the allocator, you don't need an ID -- you can entirely avoid even having a global variable here and just store another Rc<...> pointing to the allocator in the machine itself. That would only hold the allocations for that one machine so it can easily list "its" pages (and it also neatly involves having concurrent machines content on the same lock). But I don't know which further constraints are imposed by the ptrace thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, what you described is basically what I am doing. And I am now deadset on doing the many-seeds :D I am insisting that there will be Only One Ffi At A Time because otherwise I cannot begin to imagine how to tell e.g. which ffi call is responsible for a particular syscall, but it at least means I only need to deal with one lock on the main supervisor. Also I think fixing the many-seeds thing will make it so it stops having a chance to randomly hang in CI if the native-lib tests run in parallel, which is... undesirable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you look at that this fixes everything and multi-seeded mode works perfectly.

I'll wait to double-check with the github CI on the ptrace branch but if it's all good (should be) I'll push here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am insisting that there will be Only One Ffi At A Time

That sounds like not supporting many-seeds + native-libs? I am confused now.^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean many-seeds can coexist but only one can be in "ffi mode" at a time, with the others blocking until that one finishes. Otherwise it becomes borderline impossible to trace properly, and perhaps actually impossible when it comes to e.g. intercepting syscalls

nia-e and others added 3 commits May 27, 2025 17:07
@nia-e
Copy link
Contributor Author

nia-e commented May 27, 2025

@rustbot ready

@nia-e
Copy link
Contributor Author

nia-e commented May 27, 2025

Ah seems like it never got passed back to waiting on author so in case it didn't ping you, cc @RalfJung

@RalfJung
Copy link
Member

Based on #4343 (comment) I assume this is @rustbot author again.

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels May 28, 2025
@nia-e
Copy link
Contributor Author

nia-e commented May 28, 2025

Did it all via Rc<RefCell<IsolatedAllocator>>s, and it is indeed so much cleaner...

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels May 28, 2025
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look into the actual allocator yet, just the glue code around it.

let slice = slice.into();
let size = slice.len();
let align = align.bytes();
let p_clone = params.clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this getting cloned here? That should not be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure why but trying to match on params.clone() complained, but this didn't. I'll look into it, I left that as mostly a placeholder to check what's happening but ty for reminding me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's because it needs to get captured in the closure below, so it needs its own copy. Maybe it can also be fnonce, I'll see

let size = size.bytes();
let align = align.bytes();
let p_clone = params.clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here regarding cloning.

@@ -905,7 +906,7 @@ pub trait EvalContextExt<'tcx>: crate::MiriInterpCxExt<'tcx> {
interp_ok(MiriAllocBytes::from_bytes(
std::borrow::Cow::Borrowed(bytes),
align,
(),
params.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn, this closure should be a FnOnce and not FnMut and then we can avoid a clone here...
Could you make a rustc PR for that?

Comment on lines +645 to +646
let seed = config.seed.unwrap_or(0);
let rng = StdRng::seed_from_u64(seed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was for when I was using the seeds as machine ids, oops! Will revert

@@ -532,6 +532,10 @@ pub struct MiriMachine<'tcx> {
/// Needs to be queried by ptr_to_int, hence needs interior mutability.
pub(crate) rng: RefCell<StdRng>,

/// The allocator used for the machine's `AllocBytes`.
#[cfg(target_os = "linux")]
pub(crate) allocator: Rc<RefCell<crate::alloc::isolated_alloc::IsolatedAlloc>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Option<Rc<...>> since it should only exist in native-libs mode.

Co-authored-by: Ralf Jung <post@ralfj.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants