Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more ways of using mmap #3605

Open
RalfJung opened this issue May 14, 2024 · 7 comments
Open

Support more ways of using mmap #3605

RalfJung opened this issue May 14, 2024 · 7 comments
Labels
A-linux Area: affects only Linux targets A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@RalfJung
Copy link
Member

In a lengthy Zulip discussion, it was discovered that there are modes of use of mmap that are perfectly fine within the scope of Rust's memory model, but not supported by Miri's current implementation:

  • do an initial big reservation with MAP_NORESERVE, letting the kernel pick a suitable memory range and reserve that address space. (Yes, MAP_NORESERVE still reserves the address space. Talk about confusing flag names...) This may overcommit if permitted by the kernel (that's what "noreserve" refers to), but the memory is now all read/write accessible.
  • then later do smaller mmap in that range that actually "reserves" the memory (no more overcommit). This may set some flags to get huge tables if possible. It will (may?) also erase the previous contents of the re-mapped ranges, but doesn't change the range of memory that is read/write accessible, so it's fine with our current memory model.

See here for some example code. Thanks to Nils for helping with the exploration here!

@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims A-linux Area: affects only Linux targets labels May 14, 2024
@newpavlov
Copy link
Contributor

I haven't read the discussion fully, but here unsupported cases which were encountered by me:

  • Read-only mappings created using PROT_READ (i.e. without PROT_WRITE),
  • Mappings populated with MAP_POPULATE,
  • Reserving address space with PROT_NONE,
  • Mappings with MAP_FIXED (to populate address range created with PROT_NONE).

@saethlin
Copy link
Member

saethlin commented Jul 3, 2024

I think at one point I started working on all of these then backed out. So here are some notes.

  • Mappings populated with MAP_POPULATE,

Is this useful without supporting file mappings? Adding file mappings would be a whole thing in itself; I don't think we can support MAP_SHARED so it would be an odd sort of thing to tell users about because most users in the ecosystem MAP_SHARED their files even if they only want to read from them.

  • Reserving address space with PROT_NONE,

I don't know what semantics people expect when a program tries to access PROT_NONE memory. Making the interpreter halt is probably the only viable option, because otherwise we'd have to... execute the segfault handler? I don't think it would be right to report UB here. Most likely PROT_NONE would have to be implemented in before_memory_read?

  • Mappings with MAP_FIXED (to populate address range created with PROT_NONE).

I tripped over my own feed trying to wire this up before, because of the many ways that MAP_FIXED can be used. But perhaps with the constraint that every mmap call returns a separate allocation it's simpler now.

Can you link your codebase that uses these APIs? That would be quite educational.

@newpavlov
Copy link
Contributor

Is this useful without supporting file mappings?

In our case we use it to reserve physical memory. Our program allocates one big memory chunk at startup and then works mostly with it. As I understand it, relying on MAP_NORESERVE for this is a misuse of the flag, since it's primarily about swap space and in our case by default we disable swap completely for the mapping using mlock.

I don't know what semantics people expect when a program tries to access PROT_NONE memory.

We use it to reserve one big continuous chunk of virtual memory which then gets mapped using MAP_FIXED. Depending on app configuration, one part of the chunk can use huge pages. We also use it for pseudo-vector data structs, which allocate a requested capacity with PROT_NONE and we map it gradually page-by-page using mremap using pages from the common pool.

Can you link your codebase that uses these APIs?

Unfortunately, it's a proprietary product and we do not have plans to open source it in the near future.

@saethlin
Copy link
Member

saethlin commented Jul 4, 2024

and we map it gradually page-by-page using mremap

Do you rely on multiple mremap calls extending a single allocation? Or is the address range made available by each mremap call treated as a separate allocation?

I'm asking because the model that Miri implements right now is that mmap and mremap behave like malloc and realloc in the sense that no matter what the address values actually are, you cannot use ptr::offset to walk from one call to realloc to the allocation produced by another realloc call. If you need to be able to do that, we might have a deeper problem.

@newpavlov
Copy link
Contributor

Do you rely on multiple mremap calls extending a single allocation?

This one. We effectively implement a custom realloc which guarantees address stability of the allocation.

If you need to be able to do that, we might have a deeper problem.

Right now we rely on cfg(miri) to map the full capacity at once to work around this restriction. This means that MIRI tests run slightly different code, but it's better than nothing.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 4, 2024

This one. We effectively implement a custom realloc which guarantees address stability of the allocation.

There's been a bunch of discussion around that, but the gist is that currently this isn't something LLVM supports. See e.g. this thread. I brought this up with LLVM and I think it's a docs-only change to add support for at least a basic version of this -- but before we allow anything like that in Miri or otherwise consider this a blessed pattern in Rust, we need to get LLVM fixed.

(Also note that realloc, even if the address stays the same, generates a new provenance. Accesses through the old pointer are always UB. So it's not just about address stability, it's about keeping the provenance alive.)

This issue is mostly about supporting more things to be done with mmap without having to change the Rust memory model.

@newpavlov
Copy link
Contributor

This issue is mostly about supporting more things to be done with mmap without having to change the Rust memory model.

Yes, I understand. This is why I haven't mentioned use of mremap in my initial comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linux Area: affects only Linux targets A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

No branches or pull requests

3 participants