-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About "unleaking": what is the required pointer provenance in dealloc
?
#316
Comments
This is a real kicker, because the easiest implementation for libstd to provide the allocator is either |
AIUI, provenance for a pointer tracks (in some unspecified way) how that pointer relates to an underlying allocation. A heap allocation is a call to the Rust memory interface. Therefore, I don't see how it can make sense to talk about the provenance of a pointer on "the other side" of the memory interface: provenance begins when the pointer is returned from Maybe there is some layering system, where each allocator must observe the provenance rules of any allocator it defers to, but at some point there must be some "root allocator" that can slice up memory as it sees fit - either by provenance not existing at that level, or all pointers having the same provenance. In other words, I'm not sure it's really possible to ask questions about what is sound here, when AFAIK we haven't actually specified what rules code on the other side of the memory interface must follow? |
Provenance really doesn't begin or end, it's state attached to a pointer that indicates what allocation it points into and what offset/extent it can access. The allocation itself does, but it makes more sense to say that the allocation's storage duration ends when
This implies that you either cannot write the allocator in Rust (or at all in a way that rust requires you to follow its memory model...), that the allocator cannot access any memory via the deallocated pointer, or that the allocator must know the "original provenance" of all pointers it yields. The latter may seem like a good idea, but in practice, it implies that the allocator only pulls memory from a single contiguous range (for example, a static buffer allocator). Many |
Ok, but that state can't exist until the allocation exists (or how could it indicate the allocation...), and the allocation doesn't exist until
It implies that the rules must work at least slightly differently for code written to implement an allocator. This seems to follow directly from the fact that provenance depends on the concept of an allocation, and for this "root" allocator, there's no such thing as a heap allocation. AFAIK, what these modified rules might be is still unspecified, hence why it is difficult to discuss answers to the questions in this issue. |
ISTM like forbidding the use of C memory allocators for the global allocator is a non-starter. Supporting this is should be considered a design constraint on the set of operations we consider to be well-defined behavior, as not supporting it both retroactively make far too much code UB (far more than forbidding unleaking) — not only would this be a massive breaking change that we don't need to make (which would hurt a great deal going forward1), it would prevent use of Rust from several situations that Rust was designed to be usable in. So, it's really not an option to forbid use of common allocators. That said, the way that SB forbids these container_of-operations to access allocation metadata feels like another of these SB-specific limitation around not allowing access to memory, even if no other reference covers it. IOW, this is a tooling limitation, and probably would not be a problem for other Rust aliasing or provenance models. So, I'm not sure there's a reason to forbid that in order to make unleaking safe. But even if so, Footnotes
|
Agreed. This is a case of #256.
I'll note that the same is true in the C spec, where container_of has dubious legality, allocators run into issues with type-based aliasing, and there is an ongoing effort to totally overhaul the provenance model. These are mostly theoretical issues rather than a practical ones, as there aren't any common compiler implementations that exploit these cases of (potential) UB. But you could say the same about Rust. It would be unfortunate if these UCG discussions scared people away from writing allocators in Rust, though I'm not convinced they actually are doing so. |
The container_of operation being not strictly-conforming because "the standard 'permits an implementation to tailor how it represents pointers to the size of the objects they point at'" is along the lines of them saying TBAA doesn't cause major problems for allocators if they're carefully written, at least in C where you can follow the effective type rules — in C++ you run into the problems described here, but realistically I doubt anybody accepts arguments that you can't use memory that comes from mmap because 'the bytes inside it don't constitute objects' or whatever, please1. And it's worth pointing out, the proposed provenance models in C are all considerably more forgiving than stacked borrows, and the responsible groups have been considerably more hesitant to declare existing code in the wild UB than SB or this group has been.
It's hard to say, I suspect the other concerns are larger, and didn't mention all2 of the reasons it's annoying. But honestly, the ergonomics of raw pointers alone have been mostly enough to prevent from doing anything substantial, despite the fact that I have quite a few that I was very happy with in C++. That said, it's worth noting... the outcome of "technically might be UB but practically well-supported" in C code is different than in Rust code. In Rust code you'll get an advisory filed (which will likely later be mapped to a CVE with a ludicrously high score3). The rust community, for better or worse, is very aggressive about these things. This has been noted before, but has absolutely had an effect on what libraries get written/published. This has probably veered well into the offtopic, though! Footnotes
|
I agree there are unanswered questions around how to implement an allocator inside a language with a high-level provenance model. These questions apply to Rust as much as they apply to C -- I do not think it is currently possible to write an allocator and link it into an application all inside the scope of the C spec. As was said above, the allocator is a "language primitive" that performs some amount of "magic" when it comes to provenance. I am not saying that is great but I do not see Rust being at greater risk than C here -- Rust is just a lot more upfront and explicit about its provenance model, making such issues much more obvious. (IOW, there is no Miri for C.)
I would say there are layers here: there is provenance on the side of the allocator impementation, and provenance on the side of the allocator user, but those are basically separate from each other. When the implementation returns a pointer from If we consider the allocator implementation to be itself implemented in Rust, then I think we have to say that there is some kind of 'magic' that happens as the pointer crosses the allocator boundary: its provenance is changed, somewhat akin to a Stacked Borrows So, we could imagine that when a pointer
When a pointer
I have not fully thought about how this interacts with the more intricate provenance of Stacked Borrows, but I don't see why this would not work there. This might prevent having the same memory region accessed both from inside and outside the allocator implementation, but that seems like a violation of the idea of an allocator anyway: the memory returned from |
The concept of "allocation planes" discussed in #328 might impact this discussion. For at least So answering the OP: the provenance of the pointer given to global (For other allocator implementations, c.f. rust-lang/wg-allocators#101.) |
So, interesting information: The windows implementation of
And
This means that so long as (Away from keyboard atm otherwise I'd try to provoke a UB diagnostic from I really need to fully write up my thoughts on overlapping "rust allocated objects" and putting the AM "allocated object" laundering barrier on |
You can see this happening in rust-lang/miri#2104. |
Sorry if this is a duplicate, but I like the "keywords" I've showcased in this issue. Other related issues:
T
too strict? #134 <- canonical issue?Note that the quite loaded term "provenance" is being used here as described mainly in #134.
Unleaking
The stdlib libs docs currently state, regarding
Box::leak
:So, even if there is no code snippet, such statement is stating that:
is sound, no matter the
alloc::Global
backing it.A far-fetched / contrived generalization to any
impl Allocator
Which, given
Box
's implementation, is assuming that if somebody asks animpl GlobalAlloc
—or animpl Alloc
if generalizing— memory for aLayout::new::<T>()
(throughalloc
orrealloc
), and gets back a non-null pointerptr
, then it is then legal to give backptr
to thatimpl Alloc
'sdealloc
(orrealloc
), but withptr
's provenance having been "shrunk" down to thatT
's layout (e.g., throughptr = <*mut _>::cast(&mut *ptr.cast::<MaybeUninit<T>>());
).This, in practice, can be quite problematic for many (most?)
GlobalAlloc
implementations out there, since they do often perform some bookkeeping and whatnot laid out contiguously to the yielded allocation, and such metadata would thus not be accessible from such a returned pointer alone: the allocator would thus need to keep some extra data / state to be able to get back provenance over the user-facing allocation and the contiguous metadata.A simplified example
The interesting lines here are:
if
ptr
were to stem from&i32
(e.g.,let r: &i32 = …; dealloc(r.into());
), even if that&i32
had originated from analloc()
-yielded ptr (let r: &i32 = alloc().unwrap().as_ref();
), then the operation readingmeta
would not be well-defined:r.add(4)
would yield an off-by-one pointer, which would not be usable to perform a read-dereference with.The two possible workarounds
In a world without any abstraction whatsoever, the answer to this problem is easy: keep a pointer with provenance over that allocated
I32AndMeta
around (such as theptr
returned byalloc
itself), and use it to "launder" the received ptr. But since there is thisAlloc
/GlobalAlloc
boundary, the question remains: who should be responsible for doing this?Would it be the
Alloc
ator, as in:or would it be the user of the
Alloc
ator, by declaring "unleaking" to be a contract violation / by requiring that a pointer with the originally-obtained provenance be the only valid input for a{de,re}alloc
call?This point ought to be clarified, and if going for the latter —or until confirming the former—, then the stdlib docs should be updated to actually disincentivize unleaking.
My potentially-obvious two cents
It feels like the "legalized unleaking" approach has the drawback of requiring that extra
get_ptr_with_provenance(…)
operation, which could come with a non-negligible cost for allGlobalAlloc
implementations, only to allow a potentially deemed niche "unleaking" operation.But it also feels like "forbidden unleaking" approach is quite a footgun, if, for instance, even the stdlib docs have gotten it wrong for such a long time.
So this seems like the classic "let's gauge/measure the performance benefits of 'forbidden unleaking' / the performance cost of 'legalized unleaking'" to compare them against the footguns that forbidding it introduces.
Finally, and this is technically beyond Rust's reach, there is also the question of non-Rust pervasive implementations of
GlobalAlloc
, such as that oflibc
(malloc
,calloc
,realloc
,free
) and whatnot. Such implementations do use metadata, and according to @chorman0773, the cost of aget_ptr_with_provenance(…)
operation would be very much non-negligible (and, technically, even more so since Rust cannot go and tweak such an implementation, and would thus have to wrap it in a black-box API kind of fashion).So, from the looks of it / IIUC, the only practical approach w.r.t. a legalized unleaking would be to ban
malloc
& friends from being used forGlobalAlloc
implementations! But I may very well be wrong; I'll let @chorman0773 (and others) chime in and clarify this hypothetical point (although if this were to be true, then I guess there is really no other choice than forbidding unleaking).malloc
-powered#[global_allocator]
to allowfree
-ing to occur from the C side, but this is yet another topic…The text was updated successfully, but these errors were encountered: