Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about: custom allocators #442

Open
RalfJung opened this issue Aug 8, 2023 · 30 comments
Open

What about: custom allocators #442

RalfJung opened this issue Aug 8, 2023 · 30 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 8, 2023

Custom allocators (#[global_alloc]) come with all sorts of exciting interactions with the operational semantics:

  • The compiler is allowed to optimize away calls to them even if these calls have side-effects. Is there some operational way to rationalize this?
  • Something exciting happens with provenance of pointers that move in and out of the allocator. But what? See for example this. LLVM has return-position noalias to reflect this magic. That attribute is super strong though.
  • Also, which allocators are even affected by this? Right now, we tell LLVM that the registered global_allocator is magic, but direct calls to an allocator (whether it uses the GlobalAlloc trait or the Allocator trait) are just regular function calls. But maybe we want those to be magic as well for more optimizations? Or maybe we at least want a way for an allocator to opt-in to such magic? An Allocator that returns memory from some previously-allocated buffer certainly violates LLVM return-position noalias so we can't give them the full treatment (at least not without first working with LLVM to improve their semantics -- maybe the scope of the return-position noalias only lasts until free is called, not until the end of the program).
  • Also, the first point from this issue shows that dealloc resets memory to "uninit" before the custom allocator function runs.
@RalfJung
Copy link
Member Author

RalfJung commented Aug 8, 2023

maybe the scope of the return-position noalias only lasts until free is called, not until the end of the program

In fact that's a problem already for a #[global_allocator] that reuses memory: the moment a pointer comes out of __rust_alloc, that memory must never ever be used again with any other pointer. But the allocator will of course keep a "root pointer" around somewhere and if this memory range gets freed and reused -- we are technically violating return-position noalias.

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 10, 2023

A possible spec (in prose) for __rust_alloc/alloc::alloc::alloc (that allows ellision) is:

Non-deterministically calls the alloc function of the currently installed global allocator, or returns some suitable storage that is provided by another existing allocation. If the allocation returned by this function is accessed, non-behaviour occurs if the storage for that allocation is provided by deallocated memory

And then __rust_dealloc/alloc::alloc::dealloc:

If the allocation pointed to by p has it's storage provided by another allocation, non behaviour occurs if that allocation has been deallocated. Otherwise, calls the dealloc function of the currently installed global allocator.

@CAD97
Copy link

CAD97 commented Aug 10, 2023

cc @rust-lang/wg-allocators

The compiler is allowed to optimize away calls to them even if these calls have side-effects. Is there some operational way to rationalize this?

The permissive model I've been using is that global allocations are serviced by the AM, and that separately the AM is permitted to use whatever sound (based on the trait interface) combination of calls into the #[global_allocator] as part of evaluating the code. This isn't particularly operational, though.

An operational version of this model would be roughly that before each AM step there might nondeterministically additionally be a call to a method of the #[global_allocator]. This is extremely permissive and makes guaranteeing nonreentrancy impossible, but something like this is needed if we want inserting spurious allocation to be possible, i.e. stack-to-heap promotion.

This is specifically useful for one thing: it makes reordering Box::new(T::new()) to do placement initialization.

It's a bit scuffed in order to maintain the proper ordering between (potentially unwinding) initialization and handle_alloc_error — initialization needs to still happen on the stack if the allocation fails (or rather, is going to), but this should be sufficient to justify the following reordering:

initialize T to stack (maybe unwind)
try to allocate
if alloc failed: handle_alloc_error
move T from stack to heap

// into

try to allocate
if alloc failed: {
    initialize T to stack (maybe unwind)
    handle_alloc_error
}
if alloc succeeded: {
    (on unwind: deallocate)
    initialize T to heap (maybe unwind)
}

Or maybe we at least want a way for an allocator to opt-in to such magic?

The default behavior of directly using an allocator implementation should do the obvious thing and run the code that's called. This is of course necessary in order to permit code to rely on stronger guarantees provided by a custom allocator.

This becomes even more interesting when code has direct visibility of the static annotated with #[global_allocator].

But it would certainly be unfortunate if using a custom allocator in some scope were to block optimizations (e.g. that instrumenting allocations prevents removing ones otherwise dead).

The usual model has been to have some sort of Magic<A> wrapper which nondeterministically either actually uses the inner allocator or the magic AM-provided allocation. (AIUI, this roughly corresponds to annotating the functions with LLVM alloc-family alloc-kind.) Global would then be able to use this magic around calling into the #[global_allocator].

This also leads to a less permissive operational model that still permits allocation elision: whenever an allocation is requested, nondeterministically choose between calling into the #[global_allocator] and directly servicing the request with some other method. Daemonic nondet means that code must only rely on the basic guarantees of the allocation interface, whereas angelic nondet would allow code to rely on stronger guarantees.

I've wondered if it's possible to permit code to make global reasoning based on the global allocator (e.g. that it always returns an allocation minimally aligned to eight bytes, thus pointer tagging in low bits is free real estate), e.g. saying that if an allocation actually occurs, it is served by the #[global_allocator]. Angelic nondeterminism could do so while still allowing some allocation to be moved to the stack if it's obvious that no guarantees about the allocation are used. Unfortunately, I don't think it's sufficient to ever actually do so, because potentially observable effects aren't limited to just refining the output domain range.

It's worth noting that this loses another potential transformation which the freer model permits: allocation merging. Merging (Box<T>, Box<U>) into Box<T, U> is theoretically justifiable, but I'm actually talking about removing a deallocate; allocate pair, e.g. making Box::new(Box::into_inner(b)) into a no-op.

Something exciting happens with provenance of pointers that move in and out of the allocator. But what?

The effect of what happens is essentially the creation of a new Rust Allocated Object, as described by the allocation request (and return value, if extra slack space is provided). This is certainly exciting, because if the allocation is serviced by slicing up some parent RAO, we now have aliasing RAOs. And not the nice kind of address aliasing where they exist on different planes (thus accesses are still disjoint) used to justify resource nonexhaustion.

An overly restrictive attempt at modeling the behavior would be to say that upon returning an allocation, that memory region is effectively deallocated to make space for the new RAO given to the caller. Upon deallocating the child RAO, the memory region is restored to the allocator's RAO (likely uninitialized) and it's given a pointer with the same provenance as the one it originally returned out.

I say it's overly restrictive mainly because it causes trouble for scoped allocation; a scoped allocator wants to be able to bulk free everything at the end of the scope, but if a child allocation hasn't been returned under that scheme, doing so would be UB. However, a similar scheme without this flaw could fairly straightforwardly be constructed using some meta-tag for RAOs on top of the existing borrow tag for references; it doesn't need the complexity of SB/TB, just a simple stack tracking which RAO has access currently.

This probably should only be done when going through the magic allocator barrier; a normal call to a normal function (that just so happens to be an allocator) shouldn't result in extra provenance mucking. wg-alloc has previously said that dealloc requires the same (or at least equally capable) provenance to be given as was returned from alloc, so everything should work out fine (at least given TB's &Header fix over SB) without needing magic to make things work that wouldn't without.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 13, 2023

@chorman0773 that spec seems to allow a compiler to just entirely ignore #[global_allocator] and always call some other allocator that can provide "suitable storage"? That's way too permissive, we need to give the programmer stronger guarantees than that. (Also "suitable storage" is way too weak of a guarantee, as a programmer I need a guarantee that until I deallocate this, I have exclusive ownership of that memory -- no reads or writes from anyone that I didn't share the pointer with.)

We can say it can non-deterministically choose the actually registered global allocator or the AM-native allocator; that would solve some of these issues -- but not the one where it always picks the AM-native one...

@CAD97

An operational version of this model would be roughly that before each AM step there might nondeterministically additionally be a call to a method of the #[global_allocator].

That doesn't explain why I might call Box::new(0) and then ask the allocator how many times it has been invoked and it says "0".

The effect of what happens is essentially the creation of a new Rust Allocated Object, as described by the allocation request (and return value, if extra slack space is provided). This is certainly exciting, because if the allocation is serviced by slicing up some parent RAO, we now have aliasing RAOs. And not the nice kind of address aliasing where they exist on different planes (thus accesses are still disjoint) used to justify resource nonexhaustion.

There's also the alternative where this is not a new Rust Allocated Object, but just a new sub-tree of pointer IDs in the old object (a la Tree Borrows).

This probably should only be done when going through the magic allocator barrier

My question you quoted above was also about whether this should also be done when invoking a custom allocator. Currently that's also just a regular function call, after all.

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 13, 2023

It can't call another allocator, it has to be provided from the storage of an existing allocation.
The actual manner the storage is provided would be unspecified, but the intention is that it's piggy-backed on the existing memory. For example if a global_allocator call is "provided" by a local variable, the actual allocation would be put onto the stack. It could also be put into static storage, or into another dynamic allocation that is no longer being used.

@CAD97
Copy link

CAD97 commented Aug 14, 2023

That doesn't explain why [...] ask the allocator how many times it has been invoked and it says "0".

That version of the model is relying on Global always being "the AM provides memory", independent from that the way the AM provides memory is via whatever #[global_allocator] is provided.

There's also the alternative where this is not a new Rust Allocated Object, but just a new sub-tree of pointer IDs in the old object (a la Tree Borrows).

This works for specific known allocators where no magic alloc substitution is done. This doesn't work for the global replaceable allocation, at least not while we have operations like GEP inbounds which care about the RAO bounds rather than the provenance bounds.

Having a subtree of pointer tags/IDs which e.g. pointer offsets look at instead of RAO bounds is essentially isomorphic to making an aliasing RAO subtag, which I did mention as an alternative to the more heavyweight deallocation model. This is because everything which currently references RAO would now instead reference the new concept of RAO tag.

So yes, having a tree of RAO (sub)views is the model which I was (attempting to) lay out.

This holds as necessary even in the face of switching pointer offsets to GEP nowrap btw: if I know my concrete allocator always provides addresses within some region, I'm permitted to do an offset, knowing it won't wrap, which would wrap were the allocation to be substituted for magic AM provided memory in a different memory region.

My question you quoted above was also about whether this should also be done when invoking a custom allocator. Currently that's also just a regular function call, after all.

And the answer I attempted to make prior in the post is that directly calling into an allocator (even the #[global_allocator] static item) should just be a regular function call, and that a magic allocator wrapper be used to add on the replaceable semantics.

that spec seems to allow a compiler to just entirely ignore #[global_allocator] and always call some other allocator that can provide "suitable storage"? That's way too permissive, we need to give the programmer stronger guarantees than that. (Also "suitable storage" is way too weak of a guarantee,

I don't see how it could be; suitable storage means satisfying all guaranteed properties of the allocator API. FWIW, wg-alloc did IIRC decide that relying on any property of the configured global allocator when using the global allocation functions is always unsound, even when you know what the global allocator is.

To provide a stronger guarantee than "suitable storage" out of the Global allocator is to forbid substitution of the allocator. Which, I suppose, is a question we do still need to answer: do we permit substitution of global general-purpose allocations, or exclusively just removal of allocations? If the program observes that an allocation actually exists (e.g. observes the address), does that mean that the allocation must exist in the #[global_allocator]? Is the following rewrite permissible?:

// original
fn main() {
    let mut global = Box::new(GlobalData::new());
    real_main(&mut *global);
    drop(global);
}

// transformed to:
#[start] // guaranteed nonreentrancy
unsafe fn _start() {
    static mut GLOBAL: MaybeUninit<GlobalData> = MaybeUninit::uninit();
    let global = GLOBAL.write(GlobalData::new());
    real_main(&mut *global);
    GLOBAL.assume_init_drop();
}
// and provide the same main if might be called

This substitutes a source allocation that's known to exist for the life of the program for a static allocation that doesn't require using the dynamic allocator.

Both my and Connor's spec is written assuming that this is permitted and wanted to be permitted. In theory, yes, the compiler could serve all global allocation by some substitution instead of using the #[global_allocator], and I don't know of any way to allow allocation elision that doesn't also permit allocation substitution.

The compiler not substituting allocations arbitrarily is a QOI issue. Rustc+LLVM doesn't seem to ever do heap-to-stack alloc substitution, but AIUI it should be permitted to.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 22, 2023

That version of the model is relying on Global always being "the AM provides memory", independent from that the way the AM provides memory is via whatever #[global_allocator] is provided.

Please use less confusing terminology. Global is literally defined to be whatever #[global_allocator] provides (plus magic wrappers).

This holds as necessary even in the face of switching pointer offsets to GEP nowrap btw: if I know my concrete allocator always provides addresses within some region, I'm permitted to do an offset, knowing it won't wrap, which would wrap were the allocation to be substituted for magic AM provided memory in a different memory region.

This just goes to show that you wouldn't be permitted to do the offset, since you cannot reply on your allocator actually being called. So with GEP nowrap I don't think there would be any barrier to reusing the aliasing tree/stack.

do we permit substitution of global general-purpose allocations, or exclusively just removal of allocations? If the program observes that an allocation actually exists (e.g. observes the address), does that mean that the allocation must exist in the #[global_allocator]?

I would have expected the answer to be "no we do not permit substitution; yes it must be in the global allocator."

@CAD97
Copy link

CAD97 commented Aug 23, 2023

There are two separate models I've considered:

  • "Always Magic"
    • Using the global general-purpose allocation APIs std::alloc::alloc, std::alloc::dealloc, std::alloc::realloc, std::alloc::Global as GlobalAlloc, and std::alloc::Global as Allocator provides memory satisfying the allocation request as described via an unspecified manner.
    • Evaluation of Rust code is permitted to non-deterministically include arbitrary calls into the registered #[global_allocator] that are sound per the API contract.
    • That global allocation occurs on the #[global_allocator] is a quality-of-implementation concern. However, access to system allocation is also via #[global_allocator]. E.g. miri could ignore #[globall_allocator] and always use its builtin VM allocation, but notrustc could not just ignore #[global_allocator] and always use the system allocator.
    • The purpose is for very strong permission for the compiler to do allocation elision, allocation substitution, as well as even spurious allocation.
    • This is essentially a formalization of the informal "allocation is not an observable event;" it entirely decouples when calls to the global allocator happen from the evaluation of source.
  • "Nondet substitution"
    • There logically exists some magic Replaceable<A: Allocator> which services allocation by non-deterministically either servicing the allocation request itself (via AM builtin magic) or by forwarding the request to the wrapped allocator.
    • Thought that I've put in so far has been an essentially daemonic non-determinism; the caller is never allowed to assume anything specific about the wrapped allocator, and must only rely on the minimal API guarantees or be unsound.
    • std::alloc::Global is logically implemented as using Replaceable<GA>, where GA is a "ZST reference" to the registered #[global_allocator] static item.
    • This model permits allocation elision and heap-to-stack substitution, but does not permit introducing spurious allocation such as stack-to-heap substitution.

This just goes to show that you wouldn't be permitted to do the offset, since you cannot reply on your allocator actually being called. So with GEP nowrap I don't think there would be any barrier to reusing the aliasing tree/stack.

I would have expected the answer to be "no we do not permit substitution; yes [observably performed allocations] must be in the global allocator."

The main underlying issue is that I have no idea how to justify these two at the same time. Using "angelic nondet substitution" will never be able to justify removal, because the #[global_allocator] side can have arbitrary effects which removal does not preserve. Using "daemonic nondet substitution" still permits every allocation to be serviced by some other AM-managed static and never use the #[global_allocator].

If you want nondet to be forced to use the #[global_allocator], you need some definition for when the allocation is "observed to actually exist," and you need to restrict that to observations exclusively on this side of the allocator bridge if you want to be able to ever determine that it isn't. The point about GEP is that if GEP is ever allowed to offset out of the requested allocation size, then it can serve as an observation of whether the concrete allocator was used. I believe the allowable operations in a "magic" allocation are read, write, and GEP inbounds of the magic allocation; any other operation on the pointer (or a derived pointer) is a potential observation of whether the allocation is "real" and in the #[global_allocator].

A concrete example: is it allowed to remove the heap allocation in this example? rustc+LLVM currently does not remove it (by default). Returning a value not derived from the pointer value does remove the allocation.

pub unsafe fn alloc_weirdness(layout: Layout) -> usize {
    let p = alloc(layout);
    if !p.is_null() {
        dealloc(p, layout);
    }
    transmute(p)
}

A smaller secondary issue is that it's been my impression that we do want to permit heap-to-stack substitution, at least. I'd also not be super comfortable committing to a model that forbids substitution without LLVM documenting what the optimization semantics of "alloc-family"-annotated functions are.

@comex
Copy link

comex commented Aug 23, 2023

A concrete example: is it allowed to remove the heap allocation in this example? LLVM currently does not remove it. Returning a value not derived from the pointer value does remove the allocation.

If you take your example and make the layout constant, remove the null check, and add -C passes=attributor, then LLVM will remove it (edit: by which I mean it will replace the heap allocation with an actual stack allocation).

Not sure how relevant that is… there are probably much dodgier things you can get LLVM to do by enabling non-default passes… but, well, I saw a reference to it in a mailing list post and was curious.

@CAD97
Copy link

CAD97 commented Aug 24, 2023

I'll agree that adding in LLVM passes that aren't enabled by default by the rustc codegen backend is at best dodgy, but the fact that LLVM does have a pass to optimize "alloc-family" functions to allocas does indicate fairly strongly that LLVM considers such substitution valid (even if not desirable by default). If we want to forbid "real substitution" of our global allocation functions, we'd need the LLVM codegen backend to

  • not use the "alloc-family"="FAMILY" annotations if there's any chance of the heap2stack pass being enabled, or
  • use newly introduced allockind("KIND") annotations to indicate what removal/substitution/unification/spurious/repordering semantics are permitted for the alloc family.

I see good reason for rustc to provide a QOI-style guarantee that there isn't a bonus general-purpose allocator that can get used instead of the #[global_allocator]. I don't see a good reason that the spec for Replaceable<A> (the allocator magic wrapper) should restrict the semantics any further than a nondeterministic choice per allocated object whether to use the wrapped allocator or some other source. Especially when (AIUI) replacing/substituting allocations is the typical reasonably well understood justification path for removing allocations. (It may just be a gap in my knowledge, but repeating, I don't know of any model which justifies removal but not heap2stack substitution; even the simple "allocation isn't an observable event" does.)

But one guarantee I think we can and definitely should provide is that a replaced allocation will succeed, and that the only way allocation fails (returns null) is by calling the underlying allocator. Replacing allocations with a failed allocation is both degenerate and nondesired. If a sanitizer wants to inject failed allocations it can do so by instrumenting the global allocator instead of the standard allocation replacement.

transformation examples

All examples assume no unwinding. Unwinding-friendly examples can be constructed by including the dealloc on unwind. Examples also assume allocation never fails (and/or "real" dealloc accepts null), for simplicity.

Removal:

unsafe fn src() {
    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    dealloc(p, layout);
}

unsafe fn tgt() {
    // this function purposely left blank
}

Substitution (heap2stack, addr-used):

unsafe fn src() -> usize {
    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    dealloc(p, layout);
    p.addr()
}

unsafe fn tgt() -> usize {
    let alloca = MaybeUninit::<i32>::uninit();
    let p = addr_of!(alloca);
    p.addr()
}

Substitution (heap2stack, arbitrary-used):

unsafe fn src() {
    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    black_box(p.cast::<i32>());
    dealloc(p, layout);
}

unsafe fn tgt() {
    let mut alloca = MaybeUninit::<i32>::uninit();
    let p = addr_of_mut!(alloca);
    black_box(p.cast::<i32>());
}

Substitution (heap2static, arbitrary-used):

unsafe fn src() {
    static mut REENTRANT: bool = false;
    if replace(&mut REENTRANT, true) {
        unreachable_unchecked()
    }

    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    black_box(p.cast::<i32>());
    dealloc(p, layout);
}

unsafe fn tgt() {
    static mut ALLOCA: MaybeUninit<i32> = MaybeUninit::uninit();

    let p = addr_of_mut!(ALLOCA);
    black_box(p.cast::<i32>());
}

Unification (object-used):

unsafe fn src(p: *mut u8) -> *mut u8 {
    let layout = Layout::new::<i32>();
    dealloc(p, layout);
    alloc(layout)
}

unsafe fn tgt(p: *mut u8) -> *mut u8 {
    p
}

Spurious (via code motion, arbitrary-used):

unsafe fn src(b: bool) {
    if b {
        let layout = Layout::new::<i32>();
        let p = alloc(layout);
        black_box(p.cast::<i32>());
        dealloc(p, layout);
    }
}

unsafe fn tgt(b: bool) {
    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    if b {
        black_box(p.cast::<i32>());
    }
    dealloc(p, layout);
}

Reordering (alloc order):

unsafe fn src() {
    let layout1 = Layout::new::<i32>();
    let layout2 = Layout::new::<i64>();

    let p1 = alloc(layout1);
    let p2 = alloc(layout2);
    dealloc(p2, layout2);
    dealloc(p1, layout1);
}

unsafe fn tgt() {
    let layout1 = Layout::new::<i32>();
    let layout2 = Layout::new::<i64>();

    let p2 = alloc(layout2);
    let p1 = alloc(layout1);
    dealloc(p1, layout1);
    dealloc(p2, layout2);
}

Reordering (over arbitrary):

unsafe fn src() {
    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    dealloc(p, layout);
    black_box();
}

unsafe fn tgt() {
    let layout = Layout::new::<i32>();
    let p = alloc(layout);
    black_box();
    dealloc(p, layout);
}

I'll see if I can try sticking equivalents into the godbolt compiler explorer's alive2 when I get a chance.


a small thought on nondeterminism

Angelic nondeterminism means that if a choice that can make the execution DB is available, (one of) that choice is chosen. Daemonic nondeterminism means that if a choice that can make the execution UB is available, (one of) that choice is chosen.

Allocator replacement nondeterminism kindof doesn't want either? It wants nondeterminism, but arbitrary. If you rely on some allocation happening on the global allocator for absence of UB, that doesn't make your evaluation always UB (as it would with daemonic) or force the global allocator to be used (as it would with angelic), it "just" relys on an arbitrary property and makes the code unsound.

Though on the other hand, it "working" is a valid outcome of UB, so I guess daemonic choice would apply. This just feels closer to an implementation choice (implicitly relying on it works without poisoning the entire program semantics with UB if correct, even if ill advised) than daemonic nondeterminism (relying on it is just UB and invalidates all program behavior reasoning, even the independently localized).

I don't know if this is a distinction the spec/opsem can make, let alone if it would want to.

Full sanitization Miri would always directly serve Replaceable<A> allocations with freshly minted Miri-managed objects. (This suggests daemonic ndet.) But "make more code work" mode Miri would always use the underlying allocator (e.g. Global calling into the #[global_allocator] static if present, and only providing System with Miri-managed objects). Maybe default mode would sometimes use one or the other based on This switch feels vaguely similar to symbolic alignment: reject some potentially valid patterns to more thoroughly catch some invalid assumptions.


@RalfJung
Copy link
Member Author

RalfJung commented Aug 25, 2023

I agree that heap-to-stack transformation is a reasonable transformation that we shouldn't close the door on. So I guess you are right, it is allowed to use the AM-native allocator instead of calling the declared global one. To me this clearly sounds like demonic non-det, I don't understand why you consider that problematic?

I am less convinced that we want to allow spurious heap allocations. Having the compiler invoke the allocator arbitrarily any time seems really hard to reason about. Some code really cares about not invoking the allocator in certain sections (such as in signal handlers, interrupt handlers, or the implementation of the allocator itself) -- how would that be guaranteed?

For what happens with the allocations, one model I was considering (assuming that we have non-contiguous allocations) is that when memory is returned from a custom allocator, we "punch a hole" into the allocation it comes from and generate a fresh AllocId for that same address range. This behaves like a regular fresh AM allocation until it is passed to the corresponding deallocation operation, which ends the new AllocId and puts that memory range back into the allocation it came from. On both of these "copies", the contents of that memory get reset to Uninit.

But one guarantee I think we can and definitely should provide is that a replaced allocation will succeed, and that the only way allocation fails (returns null) is by calling the underlying allocator. Replacing allocations with a failed allocation is both degenerate and nondesired. If a sanitizer wants to inject failed allocations it can do so by instrumenting the global allocator instead of the standard allocation replacement.

If we do heap-to-stack transform then allocation can fail with a stack overflow. So I think we do have to allow the allocation to fail without actually calling the underlying allocator. But the native AM allocator will never return null, that much we can indeed guarantee.

@CAD97
Copy link

CAD97 commented Sep 3, 2023

If we do heap-to-stack transform then allocation can fail with a stack overflow. So I think we do have to allow the allocation to fail without actually calling the underlying allocator. But the native AM allocator will never return null, that much we can indeed guarantee.

To be explicit, this is what I meant by replaced allocation being infallible — that it won't ever return a null pointer to user code. Stack overflow is "usual" resource exhaustion which can occur essentially arbitrarily.

I am less convinced that we want to allow spurious heap allocations. Having the compiler invoke the allocator arbitrarily any time seems really hard to reason about. Some code really cares about not invoking the allocator in certain sections (such as in signal handlers, interrupt handlers, or the implementation of the allocator itself) -- how would that be guaranteed?

It's very heavy-handed, but it's necessarily the case that no_std code without alloc doesn't contain any allocation. It would be possible, if not quite desirable, to have access to spurious allocation as a compiler flag and/or a (attribute controllable) target feature.

Another less principled but useful to actual code motion option is to say that spurious allocation may only be introduced when allocation is potentially reachable by at least one potential branch in the scope. The result is that moving allocation across branch points is allowed, but introducing truly spurious allocation isn't.

However, that every performed allocation corresponds to an actual evaluated source allocation is a useful property to maintain.

To me this clearly sounds like demonic non-det, I don't understand why you consider that problematic?

It's not, the seen problem is with trying to combine that with any amount of guarantee that the concrete allocator is ever actually used, and that it isn't entirely replaced with a bundled static allocator. I'm comfortable leaving that to QOI so long as others are.

@RalfJung
Copy link
Member Author

RalfJung commented Sep 4, 2023

Another less principled but useful to actual code motion option is to say that spurious allocation may only be introduced when allocation is potentially reachable by at least one potential branch in the scope. The result is that moving allocation across branch points is allowed, but introducing truly spurious allocation isn't.

Making transformations depend on dead code is a doomed approach IMO. It makes it a lot harder to reason about optimization correctness since the usual correctness notion used formally (contextual refinement) does not suffice any more. So I am rather fundamentally opposed to any approach of this sort.

@CAD97
Copy link

CAD97 commented Nov 1, 2023

Another optimization case which came up on IRLO is merging alloc+realloc to just a single alloc [source] and merging dealloc+alloc into a reuse when the layout matches [source]. If the opsem of optimizable allocators only does ndet choosing between "magic" and "real" allocation providers, I don't think either of these are justified.

I don't see a way that can justify these optimizations without decoupling source allocator calls from evaluated allocator calls. But we could still constrain the actually performed allocation somewhat: specify it roughly as

Each call to an Allocator method of Magic<A> non-deterministically performs zero or one arbitrary sound (per the trait definition) method calls to an Allocator method of A. This call will often be pass-through, but doesn't need to be; for example, a call to realloc may correspond to a call to alloc instead, if a prior call to alloc was optimized out.

This prevents code motion like reordering Box::new(Data::new()) to Box::new_uninit().init_with(Data::new), but permits a whole class of optimizations that merely ndet doing the call or using an alternative memory source can't.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 2, 2023

Your spec doesn't allow these optimizations either though, or does it? E.g. if I first allocate 8 bytes, and then later realloc to 16 bytes, maybe I want to optimize this to allocate 16 bytes from the start -- but the allocator could then see the request for 16 bytes rather than 8 bytes. So that would violate the property that every performed allocation corresponds to a source allocation. (The realloc might not actually be reached since the program might stop before it gets there.)

Or do you mean that you allow altering argument values at allocator function call sites as well? This feels rather unprincipled, TBH -- I would expect we run into fairly arbitrary-seeming limitations with this approach.

@CAD97
Copy link

CAD97 commented Nov 3, 2023

It does mean to allow arbitrary argument values in the calls to the underlying allocator. The only thing controlled for is the timing of calls to the underlying allocator, which must match the timing of a source allocator call; everything else is allowed to be arbitrary.

I believe this is necessary in order to allow merging realloc(alloc(layout::<[u8; 16]>()), layout::<[u8; 32]>()) to alloc(layout::<[u8; 16]>()) but still prohibit adding allocation calls where none exist prior. This timing-only restriction may result in some seemingly arbitrary limitations, but I think this is almost1 the weakest the spec can be, and removing the timing restriction means allocations can be introduced spuriously, which I think we want to avoid.

The restriction is in the abstract a principled one, as it is basically stating that how <Magic<A> as Allocator>'s implementation calls into <A as Allocator> is unspecified. It might be arbitrarily nondeterministic, but it will satisfy the Allocator trait, and it will only use the underlying allocator in a sound manner (so long as it is used in a sound manner).

Interesting side observation: whatever the choice is here will interact with allocations in const context if/when those are allowed, as those certainly aren't using the registered #[global_allocator].

Footnotes

  1. The given spec permits "zero or one" calls to the underlying allocator at each source call. Weaker would also allow more calls. Then the only restriction is the one-to-many timing restriction.

@farnz
Copy link

farnz commented Nov 3, 2023

Does the following (atop @CAD97's existing wording) make sense as a route towards restricting argument values on calls to an Allocator method of A?

The argument values in the call to a method on A must be taken from a call to a method on Magic<A>.

This has the possible extension of:

Magic<A> can use each argument from a call to its methods at most once in all calls to the Allocator methods on A.

This then allows the merging of a realloc and an alloc, since the newly produced alloc will take its size arugment from the realloc's size and the original alloc's smaller request size will be left unused, but doesn't allow the optimizer to ask for a surprisingly large amount of memory, since Magic<A> can't give you a call with a surprisingly large allocation to forward to the "real" allocator unless there is a call in the source that it's been able to elide.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 4, 2023

The restriction is in the abstract a principled one, as it is basically stating that how <Magic as Allocator>'s implementation calls into is unspecified. It might be arbitrarily nondeterministic, but it will satisfy the Allocator trait, and it will only use the underlying allocator in a sound manner (so long as it is used in a sound manner).

Thanks for that, this helps. I haven't seen anything like this before and the consequences seem hard to gauge, but it doesn't feel completely arbitrary any more.

The given spec permits "zero or one" calls to the underlying allocator at each source call. Weaker would also allow more calls. Then the only restriction is the one-to-many timing restriction.

FWIW, the restriction to "at most one" does feel somewhat arbitrary, so I feel like we might as well remove it.

@RalfJung
Copy link
Member Author

An interesting question came up here: if deallocate is written to not actually free the underlying memory, is it UB to access that memory after calling deallocate?

Given the fact that these functions are "magic", I think I would expect that the AllocId that is synthesized when allocate returns becomes invalid in deallocate no matter what.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 5, 2024

#534 demonstrates that dealloc must have special semantics in another regard as well: the contents of the memory are replaced with "uninit" before the custom deallocator runs, making sure it cannot observe the data stored in there.

@Diggsey
Copy link

Diggsey commented Oct 5, 2024

You mentioned on the other issue:

Though maybe we can reasonably apply the restriction that the memory that his is "carved out" from must not be stack memory?

I'm not sure that's tenable? I think people would expect to be able to create a fixed size array on the stack and then bump-allocate memory from that array. Also crates like https://crates.io/crates/stackalloc exist, although it's unclear if that's a problem given that it doesn't go through the normal allocation functions.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 5, 2024

I think people would expect to be able to create a fixed size array on the stack and then bump-allocate memory from that array.

How would that be done in practice, though? The stack frame of the allocator function disappears when it returns... so it'd have to be the stack frame of some outer, surrounding function that is always assumed to exist, or so?

@VorpalBlade
Copy link

I think people would expect to be able to create a fixed size array on the stack and then bump-allocate memory from that array.

How would that be done in practice, though? The stack frame of the allocator function disappears when it returns... so it'd have to be the stack frame of some outer, surrounding function that is always assumed to exist, or so?

I'm on my phone so I won't write code, just a general outline:

  • The global allocator has a static mut containing an option of a pointer to the slice of memory to use. If None, just panic that memory allocation failed. Otherwise allocate from this slice.
  • main allocates a slice on the stack and registers it into the aforementioned static mut. Then it calls a diverging function to do the actual program work.

For embedded this seems fine to me. Why would you want this? To not have to worry about the heap and stack corrupting each other perhaps (traditionally they grow to meet in the middle, without memory protection that can be very nasty to debug). If you don't even have a heap that prevents that failure mode. (This can also be accomplished with linker script tweaks to swap the order of the stack and heap so they grow away from each other, but I don't know that it works on all architectures)

@RalfJung
Copy link
Member Author

RalfJung commented Oct 6, 2024

I see, that makes sense. However, we have to be careful that LLVM does not assume that the memory returned by the global allocator is completely fresh memory that was inaccessible before -- it is fresh from a provenance perspective, but not from an address equality perspective.

Cc @nikic

@nikic
Copy link

nikic commented Oct 6, 2024

@RalfJung If we're talking about just noalias, my mental model would be that it gives out fresh provenance, and that's it. Now, in practice, LLVM does end up associating more optimizations with noalias for historical reasons -- some of those should be using allockind instead.

As for address equality, the answer is, as usual, "it's a mess". You can find the relevant code here: https://github.com/llvm/llvm-project/blob/b3e0bd3d284dec705386b1efcae40dd51b763010/llvm/lib/Analysis/InstructionSimplify.cpp#L2799-L2864 This code is known to be incorrect, in particular the part using CustomCaptureTracker. The intent here is that the address returned by an allocator is unpredictable in the same sense as an alloca address. That is, we can fold away comparisons against arbitrary addresses as long as we maintain consistency (but the current implementation fails to maintain consistency).

So even if the allocator may return the same pointer multiple times (which most reasonable allocators can...) LLVM may pretend that any two allocations are at different addresses. Similarly, if you know that two consecutive allocations will have consecutive addresses (for, say, a bump pointer allocator), LLVM may pretend that A + sizeof(A) != B etc.

(I do think that this should be based on something like allockind("unpredictable") rather than noalias.)

@RalfJung
Copy link
Member Author

RalfJung commented Oct 6, 2024

I am not sure what the full set of attributes is that we attach to __rust_alloc. I think this is determined here.

The intent here is that the address returned by an allocator is unpredictable in the same sense as an alloca address. That is, we can fold away comparisons against arbitrary addresses as long as we maintain consistency (but the current implementation fails to maintain consistency).

Wait, it does that not just for malloc but for any noalias-returning function? Uh, that's not good.

LLVM may pretend that any two allocations are at different addresses.

For allocations that have disjoint lifetimes, this seems unsound to me, since the program can also easily observe that the two pointers are the same.

@nikic
Copy link

nikic commented Oct 6, 2024

For allocations that have disjoint lifetimes, this seems unsound to me, since the program can also easily observe that the two pointers are the same.

If the folding is consistent, how can the program observe they are the same?

@VorpalBlade
Copy link

For allocations that have disjoint lifetimes, this seems unsound to me, since the program can also easily observe that the two pointers are the same.

If the folding is consistent, how can the program observe they are the same?

Can it realistically be consistent though? If you do ptr2int casts, or send the two addresses to some non-inlined function and it compares on your behalf?

@nikic
Copy link

nikic commented Oct 6, 2024

For allocations that have disjoint lifetimes, this seems unsound to me, since the program can also easily observe that the two pointers are the same.

If the folding is consistent, how can the program observe they are the same?

Can it realistically be consistent though? If you do ptr2int casts, or send the two addresses to some non-inlined function and it compares on your behalf?

Yes, the consistency requirement obviously precludes folding comparisons for allocations that are captured.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 8, 2024

If the folding is consistent, how can the program observe they are the same?

By sharing state with the allocator.
I don't think we can realistically make it UB to share such state; it's unclear to me what that would even mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants