Isolate MiriMachine memory from Miri's #4343

nia-e · 2025-05-22T12:05:04Z

Based on discussion surrounding #4326, this merges in the (very simple) discrete allocator that the MiriMachine will have. See the design document linked there there for considerations, but in brief: we could pull in an off the shelf allocator for this, but performance isn't a massive worry and doing it this way might make it easier to enable support for doing multi-seeded runs in the future (without a lot more extraneous plumbing, at least)

nia-e · 2025-05-22T12:41:08Z

@rustbot ready

RalfJung

Thanks for the PR! I left some first comments, but this is not a full review. I'd rather not reverse-engineer the invariants of MachineAlloc myself, so I'll wait for you to document them, which will make review a lot easier.

Furthermore, all pub fn in discrete_alloc should have proper doc comments, not just a safety comment. Please also add some basic unit tests -- we don't use them much in Miri, but this is one of the cases where they would make sense.

On Zulip you mentioned some benchmarks. Can you put benchmark results for the variant that you ended up going for here into the PR?

Cargo.toml

src/discrete_alloc.rs

src/machine.rs

RalfJung · 2025-05-24T12:49:16Z

@rustbot author

rustbot · 2025-05-24T12:49:19Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

nia-e · 2025-05-24T15:05:31Z

I'll post benchmarks in a bit! I realised there might be some speed gains to be made with very simple changes, so I'll just experiment a little first. Thanks for the comments ^^

nia-e · 2025-05-24T15:50:15Z

Baseline is set to having the allocator fully disabled. It's only marginally slower in most cases, though it struggles with large allocations it seems. I wonder how much work it would be to improve that, but if we go down the "only machines using this will touch it" path I hope it's not too bad? I got a slight (~4%) improvement from calling mmap directly instead of calling alloc::alloc() but I doubt that's worth it. Is Miri built with optimisations on by default when invoking ./miri bench?

Comparison with baseline (relative speed, lower is better for the new results):
  backtraces: 1.36 ± 0.08
  big-allocs: 32.71 ± 1.60
  mse: 1.05 ± 0.10
  range-iteration: 1.06 ± 0.03
  serde1: 1.00 ± 0.03
  serde2: 1.06 ± 0.03
  slice-chunked: 1.03 ± 0.04
  slice-get-unchecked: 0.93 ± 0.08
  string-replace: 1.06 ± 0.04
  unicode: 1.06 ± 0.05
  zip-equal: 1.08 ± 0.02

RalfJung · 2025-05-24T15:54:19Z

Is Miri built with optimisations on by default when invoking ./miri bench?

Yes.

A 32x slowdown with big allocations is hefty.^^ Shouldn't those just forward to alloc::alloc anyway as part of the huge allocs treatment? Without looking at the code, what I'd assume happens when the allocation request is close to the page size is: round up to multiple of page size, and then just allocate that and use it directly without any kind of tracking of "which parts of this page are used" or so. That should have basically no overhead.

nia-e · 2025-05-24T15:55:31Z

It does mostly do that, which is what's confusing me... I'll try to fix it, I assume I just missed something really obvious.

nia-e · 2025-05-24T16:38:01Z

I checked; seems like the big-allocs test specifically calls alloc_zeroed which was the reason for the slowdown (apparently ptr.write_bytes() is a really slow way to zero out bytes). I changed up the logic a bit to make the logic generic over calling alloc::alloc() vs alloc::alloc_zeroed() and these are the (much better!) results:

Comparison with baseline (relative speed, lower is better for the new results):
  backtraces: 1.23 ± 0.06
  big-allocs: 1.03 ± 0.05
  mse: 0.99 ± 0.09
  range-iteration: 1.01 ± 0.03
  serde1: 0.99 ± 0.02
  serde2: 1.03 ± 0.02
  slice-chunked: 0.97 ± 0.04
  slice-get-unchecked: 0.89 ± 0.08
  string-replace: 1.00 ± 0.03
  unicode: 1.01 ± 0.04
  zip-equal: 1.05 ± 0.04

I might be able to squeeze a bit more perf out by actually making the functions generic instead of just passing in a function pointer but shrug, unsure if it's necessary

RalfJung · 2025-05-24T16:45:11Z

Ah yes, that is exactly why we added that particular benchmark. :)

nia-e · 2025-05-24T16:46:34Z

What kind of unit tests do you think belong here? I assumed functionality is covered by the usual tests, but I'll happily add in some stuff if you think it's relevant

RalfJung · 2025-05-24T16:50:10Z

What kind of unit tests do you think belong here? I assumed functionality is covered by the usual tests, but I'll happily add in some stuff if you think it's relevant

Similar to range_map: just add some functions with #[test] in the file with the allocator implementation to test some particular corner cases -- or at least a basic smoke test if there are no corner cases.

nia-e · 2025-05-24T18:40:02Z

Openen the PR on the main repo, ~~I'll get to adding tests if everything there is okay~~ it is not okay I need to do more things oops

nia-e · 2025-05-24T21:34:23Z

Expecting the build to fail for now since it's adapted to the changes from the PR (but also Miri seems to be having trouble on the current upstream master commit, so I guess it's pending that being fixed too)

nia-e · 2025-05-24T22:45:24Z

Tests added :D let me know if there's anything more to do

nia-e · 2025-05-28T15:27:11Z

My takeaway so far is that I need to get better at double-checking myself after a refactor. I hope I addressed everything brought up, though

src/alloc/isolated_alloc.rs

interpret/allocation: Fixup type for `alloc_bytes` This can be `FnOnce`, which helps us avoid an extra clone in rust-lang/miri#4343 r? RalfJung

src/alloc/isolated_alloc.rs

RalfJung · 2025-05-29T08:23:44Z

This changed quite a bit since the last benchmark run -- could you re-run the benchmark to see whether that Rc has a significant cost?

Rollup merge of #141682 - nia-e:fixup-alloc, r=RalfJung interpret/allocation: Fixup type for `alloc_bytes` This can be `FnOnce`, which helps us avoid an extra clone in rust-lang/miri#4343 r? RalfJung

nia-e · 2025-05-29T15:36:35Z

Ok! So there's a small problem, namely that the jemalloc implementation rust uses by default is really slow with page-aligned allocations it seems; the reason perf was good before is that I forgot to actually enforce page-alignment in the alloc_huge calls. I checked; even passing in a constant 4096 as the alignment causes big_allocs to see a very significant slowdown depending on the run (seems quite highly variable - it was as low as 8.9x at one point, but I didn't save those results)

However, if we do the huge allocs by directly calling mmap, it drops back down to parity. Here's the old benchmark run:

Comparison with baseline (relative speed, lower is better for the new results):
  backtraces: 1.18 ± 0.05
  big-allocs: 22.30 ± 1.89
  mse: 1.01 ± 0.07
  range-iteration: 1.08 ± 0.02
  serde1: 1.06 ± 0.03
  serde2: 1.04 ± 0.04
  slice-chunked: 0.98 ± 0.04
  slice-get-unchecked: 1.03 ± 0.04
  string-replace: 1.00 ± 0.06
  unicode: 1.02 ± 0.04
  zip-equal: 1.01 ± 0.03

And this is what I got calling mmap directly (albeit this means the pointer will only ever be page aligned at most - unsure if relying on greater-than-page-alignment is even legal given that it's impossible to guarantee?)

Comparison with baseline (relative speed, lower is better for the new results):
  backtraces: 1.18 ± 0.05
  big-allocs: 1.07 ± 0.10
  mse: 1.00 ± 0.07
  range-iteration: 1.03 ± 0.02
  serde1: 1.02 ± 0.04
  serde2: 1.00 ± 0.04
  slice-chunked: 0.97 ± 0.05
  slice-get-unchecked: 0.97 ± 0.02
  string-replace: 0.96 ± 0.08
  unicode: 1.01 ± 0.04
  zip-equal: 1.00 ± 0.03

RalfJung · 2025-05-29T15:56:58Z

the jemalloc implementation rust uses by default is really slow with page-aligned allocations it seems; the reason perf was good before is that I forgot to actually enforce page-alignment in the alloc_huge calls.

What alignment did you set before?

relying on greater-than-page-alignment is even legal given that it's impossible to guarantee

It's definitely legal, and it's also possible to guarantee. You "just" have to allocate more pages than needed and then round up the pointer you got to the needed alignment. jemalloc likely does that if you ask for a high alignment. It's non-trivial to implement this correctly though, in particular regarding deallocation, so I'd rather not have this in the codebase.

Given that the regression only affects native-libs mode, I am also fine with just taking the hit for now and having an issue to track the problem. We can then ping some people there that might know more about this; I am fairly clueless when it comes to allocators.

nia-e · 2025-05-29T16:06:00Z

When refactoring I accidentally just left in whatever align was passed in to us (oops). But if you prefer it, I can just leave it as it was before and take the hit, or try to get the mmap implementation to overallocate like you said but I expect that will lead to some significant extra complexity (or special-case align <= page_size to use mmap?)

RalfJung · 2025-05-29T16:15:09Z

or special-case max(size, align) <= page_size to use mmap?

I'm pretty sure the offending allocations in big-allocs are the big ones, so that wouldn't help.

I'll be a lot more busy soon than I was recently and thus have less review capacity, so I'd rather land this PR today or tomorrow (in particular the glue code outside the new allocator impl). So I would propose we stick to jemalloc for now, you file an issue for the perf problem, and then if you want to make a follow-up PR that uses mmap you may do so. :) (I thought about it some more and it's not as bad as I thought since the deallocation function has access to the list of all allocations.) I would say that PR should entirely switch the allocator to mmap then, both for small and huge allocs, to avoid mixing the logic for multiple allocators in the same file.

nia-e · 2025-05-29T16:21:31Z

Hopefully this addresses everything then? I also removed the extra clone in concurrency/thread.rs since the rustc change landed

nia-e · 2025-05-29T16:25:31Z

Gah I messed up, one sec

nia-e · 2025-05-29T16:31:37Z

There. Is this fine?

RalfJung · 2025-05-29T16:34:56Z

Please squash, I'll take a look later :)

Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> allow multiple seeds use bitsets fix xcompile listened to reason and made my life so much easier fmt Update src/machine.rs Co-authored-by: Ralf Jung <post@ralfj.de> fixups avoid some clones Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> address review Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> fixup comment Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> address review pt 2 nit rem fn Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> Update src/alloc/isolated_alloc.rs Co-authored-by: Ralf Jung <post@ralfj.de> address review unneeded unsafe

RalfJung · 2025-05-29T17:42:21Z

I have done some lite refactoring, could you take a look if it makes sense to you?

nia-e · 2025-05-29T17:47:29Z

Everything checks out, ty! Should I squash it also?

RalfJung · 2025-05-29T17:50:53Z

Thanks! No need to :)

interpret/allocation: Fixup type for `alloc_bytes` This can be `FnOnce`, which helps us avoid an extra clone in rust-lang/miri#4343 r? RalfJung

rustbot added the S-waiting-on-review Status: Waiting for a review to complete label May 22, 2025

nia-e force-pushed the discrete-allocator branch 4 times, most recently from 6cbc283 to b53ed38 Compare May 23, 2025 10:39

RalfJung reviewed May 24, 2025

View reviewed changes

rustbot removed the S-waiting-on-review Status: Waiting for a review to complete label May 24, 2025

rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label May 24, 2025

nia-e force-pushed the discrete-allocator branch from 4b1f50f to 9f1047e Compare May 24, 2025 15:13

nia-e mentioned this pull request May 24, 2025

interpret: add allocation parameters to AllocBytes rust-lang/rust#141513

Merged

nia-e force-pushed the discrete-allocator branch from 87f2b3f to f7fe286 Compare May 24, 2025 22:52

This comment has been minimized.

Sign in to view

nia-e force-pushed the discrete-allocator branch 4 times, most recently from c4ad33f to b832def Compare May 25, 2025 16:20

nia-e force-pushed the discrete-allocator branch from 32672ff to 1539771 Compare May 28, 2025 15:26

RalfJung reviewed May 28, 2025

View reviewed changes

src/alloc/isolated_alloc.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 28, 2025

View reviewed changes

src/alloc/isolated_alloc.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 29, 2025

View reviewed changes

src/alloc/isolated_alloc.rs Outdated Show resolved Hide resolved

src/alloc/isolated_alloc.rs Outdated Show resolved Hide resolved

src/alloc/isolated_alloc.rs Outdated Show resolved Hide resolved

nia-e force-pushed the discrete-allocator branch 2 times, most recently from 6b43f8f to 9028dfa Compare May 29, 2025 16:20

nia-e force-pushed the discrete-allocator branch from d51d7cc to a30ad41 Compare May 29, 2025 16:37

some refactoring of the allocator

9a209ef

RalfJung enabled auto-merge May 29, 2025 17:50

RalfJung added this pull request to the merge queue May 29, 2025

Merged via the queue into rust-lang:master with commit 2c35406 May 29, 2025
8 checks passed

nia-e mentioned this pull request May 29, 2025

Poor performance for large allocations in native-lib mode #4357

Open

3 tasks

Isolate MiriMachine memory from Miri's #4343

Isolate MiriMachine memory from Miri's #4343

Uh oh!

Conversation

nia-e commented May 22, 2025 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nia-e commented May 22, 2025

Uh oh!

RalfJung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung commented May 24, 2025

Uh oh!

rustbot commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025

Uh oh!

RalfJung commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025

Uh oh!

RalfJung commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nia-e commented May 24, 2025

Uh oh!

nia-e commented May 24, 2025

Uh oh!

This comment has been minimized.

nia-e commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung commented May 29, 2025

Uh oh!

nia-e commented May 29, 2025

Uh oh!

RalfJung commented May 29, 2025

Uh oh!

nia-e commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented May 29, 2025

Uh oh!

nia-e commented May 29, 2025

Uh oh!

nia-e commented May 29, 2025

Uh oh!

nia-e commented May 29, 2025

Uh oh!

RalfJung commented May 29, 2025

Uh oh!

RalfJung commented May 29, 2025

Uh oh!

nia-e commented May 29, 2025

nia-e commented May 22, 2025 •

edited by rustbot

Loading

nia-e commented May 24, 2025 •

edited

Loading

nia-e commented May 24, 2025 •

edited

Loading

nia-e commented May 29, 2025 •

edited

Loading