-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about: custom allocators #442
Comments
In fact that's a problem already for a |
A possible spec (in prose) for
And then
|
cc @rust-lang/wg-allocators
The permissive model I've been using is that global allocations are serviced by the AM, and that separately the AM is permitted to use whatever sound (based on the trait interface) combination of calls into the An operational version of this model would be roughly that before each AM step there might nondeterministically additionally be a call to a method of the This is specifically useful for one thing: it makes reordering
|
@chorman0773 that spec seems to allow a compiler to just entirely ignore We can say it can non-deterministically choose the actually registered global allocator or the AM-native allocator; that would solve some of these issues -- but not the one where it always picks the AM-native one...
That doesn't explain why I might call
There's also the alternative where this is not a new Rust Allocated Object, but just a new sub-tree of pointer IDs in the old object (a la Tree Borrows).
My question you quoted above was also about whether this should also be done when invoking a custom allocator. Currently that's also just a regular function call, after all. |
It can't call another allocator, it has to be provided from the storage of an existing allocation. |
That version of the model is relying on
This works for specific known allocators where no magic alloc substitution is done. This doesn't work for the global replaceable allocation, at least not while we have operations like Having a subtree of pointer tags/IDs which e.g. pointer offsets look at instead of RAO bounds is essentially isomorphic to making an aliasing RAO subtag, which I did mention as an alternative to the more heavyweight deallocation model. This is because everything which currently references RAO would now instead reference the new concept of RAO tag. So yes, having a tree of RAO (sub)views is the model which I was (attempting to) lay out. This holds as necessary even in the face of switching pointer offsets to
And the answer I attempted to make prior in the post is that directly calling into an allocator (even the
I don't see how it could be; suitable storage means satisfying all guaranteed properties of the allocator API. FWIW, wg-alloc did IIRC decide that relying on any property of the configured global allocator when using the global allocation functions is always unsound, even when you know what the global allocator is. To provide a stronger guarantee than "suitable storage" out of the // original
fn main() {
let mut global = Box::new(GlobalData::new());
real_main(&mut *global);
drop(global);
}
// transformed to:
#[start] // guaranteed nonreentrancy
unsafe fn _start() {
static mut GLOBAL: MaybeUninit<GlobalData> = MaybeUninit::uninit();
let global = GLOBAL.write(GlobalData::new());
real_main(&mut *global);
GLOBAL.assume_init_drop();
}
// and provide the same main if might be called This substitutes a source allocation that's known to exist for the life of the program for a static allocation that doesn't require using the dynamic allocator. Both my and Connor's spec is written assuming that this is permitted and wanted to be permitted. In theory, yes, the compiler could serve all global allocation by some substitution instead of using the The compiler not substituting allocations arbitrarily is a QOI issue. Rustc+LLVM doesn't seem to ever do heap-to-stack alloc substitution, but AIUI it should be permitted to. |
Please use less confusing terminology.
This just goes to show that you wouldn't be permitted to do the offset, since you cannot reply on your allocator actually being called. So with
I would have expected the answer to be "no we do not permit substitution; yes it must be in the global allocator." |
There are two separate models I've considered:
The main underlying issue is that I have no idea how to justify these two at the same time. Using "angelic nondet substitution" will never be able to justify removal, because the If you want nondet to be forced to use the A concrete example: is it allowed to remove the heap allocation in this example? rustc+LLVM currently does not remove it (by default). Returning a value not derived from the pointer value does remove the allocation. pub unsafe fn alloc_weirdness(layout: Layout) -> usize {
let p = alloc(layout);
if !p.is_null() {
dealloc(p, layout);
}
transmute(p)
} A smaller secondary issue is that it's been my impression that we do want to permit heap-to-stack substitution, at least. I'd also not be super comfortable committing to a model that forbids substitution without LLVM documenting what the optimization semantics of |
If you take your example and make the layout constant, remove the null check, and add Not sure how relevant that is… there are probably much dodgier things you can get LLVM to do by enabling non-default passes… but, well, I saw a reference to it in a mailing list post and was curious. |
I'll agree that adding in LLVM passes that aren't enabled by default by the rustc codegen backend is at best dodgy, but the fact that LLVM does have a pass to optimize
I see good reason for rustc to provide a QOI-style guarantee that there isn't a bonus general-purpose allocator that can get used instead of the But one guarantee I think we can and definitely should provide is that a replaced allocation will succeed, and that the only way allocation fails (returns null) is by calling the underlying allocator. Replacing allocations with a failed allocation is both degenerate and nondesired. If a sanitizer wants to inject failed allocations it can do so by instrumenting the global allocator instead of the standard allocation replacement. transformation examplesAll examples assume no unwinding. Unwinding-friendly examples can be constructed by including the dealloc on unwind. Examples also assume allocation never fails (and/or "real" dealloc accepts null), for simplicity. Removal: unsafe fn src() {
let layout = Layout::new::<i32>();
let p = alloc(layout);
dealloc(p, layout);
}
unsafe fn tgt() {
// this function purposely left blank
} Substitution (heap2stack, addr-used): unsafe fn src() -> usize {
let layout = Layout::new::<i32>();
let p = alloc(layout);
dealloc(p, layout);
p.addr()
}
unsafe fn tgt() -> usize {
let alloca = MaybeUninit::<i32>::uninit();
let p = addr_of!(alloca);
p.addr()
} Substitution (heap2stack, arbitrary-used): unsafe fn src() {
let layout = Layout::new::<i32>();
let p = alloc(layout);
black_box(p.cast::<i32>());
dealloc(p, layout);
}
unsafe fn tgt() {
let mut alloca = MaybeUninit::<i32>::uninit();
let p = addr_of_mut!(alloca);
black_box(p.cast::<i32>());
} Substitution (heap2static, arbitrary-used): unsafe fn src() {
static mut REENTRANT: bool = false;
if replace(&mut REENTRANT, true) {
unreachable_unchecked()
}
let layout = Layout::new::<i32>();
let p = alloc(layout);
black_box(p.cast::<i32>());
dealloc(p, layout);
}
unsafe fn tgt() {
static mut ALLOCA: MaybeUninit<i32> = MaybeUninit::uninit();
let p = addr_of_mut!(ALLOCA);
black_box(p.cast::<i32>());
} Unification (object-used): unsafe fn src(p: *mut u8) -> *mut u8 {
let layout = Layout::new::<i32>();
dealloc(p, layout);
alloc(layout)
}
unsafe fn tgt(p: *mut u8) -> *mut u8 {
p
} Spurious (via code motion, arbitrary-used): unsafe fn src(b: bool) {
if b {
let layout = Layout::new::<i32>();
let p = alloc(layout);
black_box(p.cast::<i32>());
dealloc(p, layout);
}
}
unsafe fn tgt(b: bool) {
let layout = Layout::new::<i32>();
let p = alloc(layout);
if b {
black_box(p.cast::<i32>());
}
dealloc(p, layout);
} Reordering (alloc order): unsafe fn src() {
let layout1 = Layout::new::<i32>();
let layout2 = Layout::new::<i64>();
let p1 = alloc(layout1);
let p2 = alloc(layout2);
dealloc(p2, layout2);
dealloc(p1, layout1);
}
unsafe fn tgt() {
let layout1 = Layout::new::<i32>();
let layout2 = Layout::new::<i64>();
let p2 = alloc(layout2);
let p1 = alloc(layout1);
dealloc(p1, layout1);
dealloc(p2, layout2);
} Reordering (over arbitrary): unsafe fn src() {
let layout = Layout::new::<i32>();
let p = alloc(layout);
dealloc(p, layout);
black_box();
}
unsafe fn tgt() {
let layout = Layout::new::<i32>();
let p = alloc(layout);
black_box();
dealloc(p, layout);
} I'll see if I can try sticking equivalents into the godbolt compiler explorer's alive2 when I get a chance. a small thought on nondeterminismAngelic nondeterminism means that if a choice that can make the execution DB is available, (one of) that choice is chosen. Daemonic nondeterminism means that if a choice that can make the execution UB is available, (one of) that choice is chosen. Allocator replacement nondeterminism kindof doesn't want either? It wants nondeterminism, but arbitrary. If you rely on some allocation happening on the global allocator for absence of UB, that doesn't make your evaluation always UB (as it would with daemonic) or force the global allocator to be used (as it would with angelic), it "just" relys on an arbitrary property and makes the code unsound. Though on the other hand, it "working" is a valid outcome of UB, so I guess daemonic choice would apply. This just feels closer to an implementation choice (implicitly relying on it works without poisoning the entire program semantics with UB if correct, even if ill advised) than daemonic nondeterminism (relying on it is just UB and invalidates all program behavior reasoning, even the independently localized). I don't know if this is a distinction the spec/opsem can make, let alone if it would want to. Full sanitization Miri would always directly serve |
I agree that heap-to-stack transformation is a reasonable transformation that we shouldn't close the door on. So I guess you are right, it is allowed to use the AM-native allocator instead of calling the declared global one. To me this clearly sounds like demonic non-det, I don't understand why you consider that problematic? I am less convinced that we want to allow spurious heap allocations. Having the compiler invoke the allocator arbitrarily any time seems really hard to reason about. Some code really cares about not invoking the allocator in certain sections (such as in signal handlers, interrupt handlers, or the implementation of the allocator itself) -- how would that be guaranteed? For what happens with the allocations, one model I was considering (assuming that we have non-contiguous allocations) is that when memory is returned from a custom allocator, we "punch a hole" into the allocation it comes from and generate a fresh AllocId for that same address range. This behaves like a regular fresh AM allocation until it is passed to the corresponding deallocation operation, which ends the new AllocId and puts that memory range back into the allocation it came from. On both of these "copies", the contents of that memory get reset to
If we do heap-to-stack transform then allocation can fail with a stack overflow. So I think we do have to allow the allocation to fail without actually calling the underlying allocator. But the native AM allocator will never return null, that much we can indeed guarantee. |
To be explicit, this is what I meant by replaced allocation being infallible — that it won't ever return a null pointer to user code. Stack overflow is "usual" resource exhaustion which can occur essentially arbitrarily.
It's very heavy-handed, but it's necessarily the case that no_std code without alloc doesn't contain any allocation. It would be possible, if not quite desirable, to have access to spurious allocation as a compiler flag and/or a (attribute controllable) target feature. Another less principled but useful to actual code motion option is to say that spurious allocation may only be introduced when allocation is potentially reachable by at least one potential branch in the scope. The result is that moving allocation across branch points is allowed, but introducing truly spurious allocation isn't. However, that every performed allocation corresponds to an actual evaluated source allocation is a useful property to maintain.
It's not, the seen problem is with trying to combine that with any amount of guarantee that the concrete allocator is ever actually used, and that it isn't entirely replaced with a bundled static allocator. I'm comfortable leaving that to QOI so long as others are. |
Making transformations depend on dead code is a doomed approach IMO. It makes it a lot harder to reason about optimization correctness since the usual correctness notion used formally (contextual refinement) does not suffice any more. So I am rather fundamentally opposed to any approach of this sort. |
Another optimization case which came up on IRLO is merging alloc+realloc to just a single alloc [source] and merging dealloc+alloc into a reuse when the layout matches [source]. If the opsem of optimizable allocators only does ndet choosing between "magic" and "real" allocation providers, I don't think either of these are justified. I don't see a way that can justify these optimizations without decoupling source allocator calls from evaluated allocator calls. But we could still constrain the actually performed allocation somewhat: specify it roughly as
This prevents code motion like reordering |
Your spec doesn't allow these optimizations either though, or does it? E.g. if I first allocate 8 bytes, and then later realloc to 16 bytes, maybe I want to optimize this to allocate 16 bytes from the start -- but the allocator could then see the request for 16 bytes rather than 8 bytes. So that would violate the property that every performed allocation corresponds to a source allocation. (The Or do you mean that you allow altering argument values at allocator function call sites as well? This feels rather unprincipled, TBH -- I would expect we run into fairly arbitrary-seeming limitations with this approach. |
It does mean to allow arbitrary argument values in the calls to the underlying allocator. The only thing controlled for is the timing of calls to the underlying allocator, which must match the timing of a source allocator call; everything else is allowed to be arbitrary. I believe this is necessary in order to allow merging The restriction is in the abstract a principled one, as it is basically stating that how Interesting side observation: whatever the choice is here will interact with allocations in Footnotes
|
Does the following (atop @CAD97's existing wording) make sense as a route towards restricting argument values on calls to an
This has the possible extension of:
This then allows the merging of a |
Thanks for that, this helps. I haven't seen anything like this before and the consequences seem hard to gauge, but it doesn't feel completely arbitrary any more.
FWIW, the restriction to "at most one" does feel somewhat arbitrary, so I feel like we might as well remove it. |
An interesting question came up here: if Given the fact that these functions are "magic", I think I would expect that the AllocId that is synthesized when |
#534 demonstrates that |
You mentioned on the other issue:
I'm not sure that's tenable? I think people would expect to be able to create a fixed size array on the stack and then bump-allocate memory from that array. Also crates like https://crates.io/crates/stackalloc exist, although it's unclear if that's a problem given that it doesn't go through the normal allocation functions. |
How would that be done in practice, though? The stack frame of the allocator function disappears when it returns... so it'd have to be the stack frame of some outer, surrounding function that is always assumed to exist, or so? |
I'm on my phone so I won't write code, just a general outline:
For embedded this seems fine to me. Why would you want this? To not have to worry about the heap and stack corrupting each other perhaps (traditionally they grow to meet in the middle, without memory protection that can be very nasty to debug). If you don't even have a heap that prevents that failure mode. (This can also be accomplished with linker script tweaks to swap the order of the stack and heap so they grow away from each other, but I don't know that it works on all architectures) |
I see, that makes sense. However, we have to be careful that LLVM does not assume that the memory returned by the global allocator is completely fresh memory that was inaccessible before -- it is fresh from a provenance perspective, but not from an address equality perspective. Cc @nikic |
@RalfJung If we're talking about just As for address equality, the answer is, as usual, "it's a mess". You can find the relevant code here: https://github.com/llvm/llvm-project/blob/b3e0bd3d284dec705386b1efcae40dd51b763010/llvm/lib/Analysis/InstructionSimplify.cpp#L2799-L2864 This code is known to be incorrect, in particular the part using CustomCaptureTracker. The intent here is that the address returned by an allocator is unpredictable in the same sense as an alloca address. That is, we can fold away comparisons against arbitrary addresses as long as we maintain consistency (but the current implementation fails to maintain consistency). So even if the allocator may return the same pointer multiple times (which most reasonable allocators can...) LLVM may pretend that any two allocations are at different addresses. Similarly, if you know that two consecutive allocations will have consecutive addresses (for, say, a bump pointer allocator), LLVM may pretend that (I do think that this should be based on something like |
I am not sure what the full set of attributes is that we attach to
Wait, it does that not just for
For allocations that have disjoint lifetimes, this seems unsound to me, since the program can also easily observe that the two pointers are the same. |
If the folding is consistent, how can the program observe they are the same? |
Can it realistically be consistent though? If you do ptr2int casts, or send the two addresses to some non-inlined function and it compares on your behalf? |
Yes, the consistency requirement obviously precludes folding comparisons for allocations that are captured. |
By sharing state with the allocator. |
Custom allocators (
#[global_alloc]
) come with all sorts of exciting interactions with the operational semantics:noalias
to reflect this magic. That attribute is super strong though.GlobalAlloc
trait or theAllocator
trait) are just regular function calls. But maybe we want those to be magic as well for more optimizations? Or maybe we at least want a way for an allocator to opt-in to such magic? AnAllocator
that returns memory from some previously-allocated buffer certainly violates LLVM return-positionnoalias
so we can't give them the full treatment (at least not without first working with LLVM to improve their semantics -- maybe the scope of the return-positionnoalias
only lasts untilfree
is called, not until the end of the program).dealloc
resets memory to "uninit" before the custom allocator function runs.The text was updated successfully, but these errors were encountered: