-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alloca for Rust #1808
Alloca for Rust #1808
Changes from 1 commit
168857d
ba65ef3
9ad827c
0d8462f
ac6b65f
64a5da3
b09ffce
c83ce18
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
- Feature Name: alloca | ||
- Start Date: 2016-12-01 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add a builtin `fn core::mem::reserve<'a, T>(elements: usize) -> StackSlice<'a, T>` that reserves space for the given | ||
number of elements on the stack and returns a `StackSlice<'a, T>` to it which derefs to `&'a [T]`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Some algorithms (e.g. sorting, regular expression search) need a one-time backing store for a number of elements only | ||
known at runtime. Reserving space on the heap always takes a performance hit, and the resulting deallocation can | ||
increase memory fragmentation, possibly slightly degrading allocation performance further down the road. | ||
|
||
If Rust included this zero-cost abstraction, more of these algorithms could run at full speed – and would be available | ||
on systems without an allocator, e.g. embedded, soft-real-time systems. The option of using a fixed slice up to a | ||
certain size and using a heap-allocated slice otherwise (as afforded by | ||
[SmallVec](https://crates.io/crates/smallvec)-like classes) has the drawback of decreasing memory locality if only a | ||
small part of the fixed-size allocation is used – and even those implementations could potentially benefit from the | ||
increased memory locality. | ||
|
||
As a (flawed) benchmark, consider the following C program: | ||
|
||
```C | ||
#include <stdlib.h> | ||
|
||
int main(int argc, char **argv) { | ||
int n = argc > 1 ? atoi(argv[0]) : 1; | ||
int x = 1; | ||
char foo[n]; | ||
foo[n - 1] = 1; | ||
} | ||
``` | ||
|
||
Running `time nice -n 20 ionice ./dynalloc 1` returns almost instantly (0.0001s), whereas using `time nice -n 20 ionice | ||
./dynalloc 200000` takes 0.033 seconds. As such, it appears that just by forcing the second write further away from the | ||
first slows down the program (this benchmark is actually completely unfair, because by reducing the process' priority, | ||
we invite the kernel to swap in a different process instead, which is very probably the major cause of the slowdown | ||
here). | ||
|
||
Still, even with the flaws in this benchmark, | ||
[The Myth of RAM](http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-part-i) argues quite convincingly for the | ||
benefits of memory frugality. | ||
|
||
|
||
# Detailed design | ||
[design]: #detailed-design | ||
|
||
The standard library function can simply `panic!(..)` within the `reserve(_)` method, as it will be replaced when | ||
translating to MIR. The `StackSlice` type can be implemented as follows: | ||
|
||
```Rust | ||
/// A slice of data on the stack | ||
pub struct StackSlice<'a, T: 'a> { | ||
slice: &'a [T], | ||
} | ||
|
||
impl<'a, T: 'a> Deref for StackSlice<'a, T> { | ||
type Target = [T]; | ||
|
||
fn deref(&self) -> &[T] { | ||
return self.slice; | ||
} | ||
} | ||
``` | ||
|
||
`StackSlice`'s embedded lifetime ensures that the stack allocation may never leave its scope. Thus the borrow checker | ||
can uphold the contract that LLVM's `alloca` requires. | ||
|
||
MIR Level: We need a way to represent the dynamic stack `alloca`tion with both the number of elements and the concrete | ||
type of elements. Then while building the MIR, we need to replace the `Calls` from HIR with it. | ||
|
||
Low-level: LLVM has the `alloca` instruction to allocate memory on the stack. We simply need to extend trans to emit it | ||
with a dynamic `<NumElements>` argument when encountering the aforementioned MIR. | ||
|
||
With a LLVM extension to un-allocate the stack slice we could even restrict the stack space reservation to the lifetime | ||
of the allocated value, thus increasing locality over C code that uses alloca (which so far is suboptimally implemented | ||
by some compilers, especially with regards to inlining). | ||
|
||
# How to teach this | ||
|
||
Add the following documentation to libcore: | ||
|
||
``` | ||
*** WARNING *** stay away from this feature unless you absolutely need it. | ||
Using it will destroy your ability to statically reason about stack size. | ||
|
||
Apart from that, this works much like an unboxed array, except the size is | ||
determined at runtime. Since the memory resides on the stack, be careful | ||
not to exceed the stack limit (which depends on your operating system), | ||
otherwise the resulting stack overflow will at best kill your program. You | ||
have been warned. | ||
|
||
Valid uses for this is mostly within embedded system without heap allocation. | ||
``` | ||
|
||
Also add an example (perhaps a sort algorithm that uses some scratch space that will be heap-allocacted with `std` and | ||
stack-allocated with `#[no_std]` (noting that the function would not be available on no-std systems at all were it not | ||
for this feature). | ||
|
||
Do not `pub use` it from `std::mem` to drive the point home. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
- Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the flip side, this might reduce stack usage for users of |
||
leads to segfaults at best and undefined behavior at worst. On unices, the stack can usually be extended at runtime, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will we not be able to correctly probe the stack for space when alloca-ing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them. Speaking of which, are stack probes still not actually in place for regular stack allocations? Just tried compiling a test program (that allocates a large array on the stack) with rustc nightly on my system and didn't see any probes in the asm output. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can't one already overflow the stack with recursion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, which is why stack probes are inserted so that the runtime can detect that and abort with a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Except they actually aren't, since it hasn't been implemented yet (or maybe it only works on Windows?). OSes normally provide a 4kb guard page below the stack, so most stack overflows will crash anyway (and trigger that error, which comes from a signal handler), but a function with a big enough stack frame can skip past that page, and I think it may actually be possible in practice to exploit some Rust programs that way... I should try. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @whitequark There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am working on integrating stack probes, which are required for robustness on our embedded system (ARTIQ). I expect to implement them in a way generic enough to be used on all bare-metal platforms, including no-MMU, as well as allow for easy shimming on currently unsupported OSes. |
||
whereas on Windows stack size is set at link time (default to 1MB). | ||
|
||
- Adding this will increase implementation complexity and require support from possible alternative implementations / | ||
backends (e.g. Cretonne, WebASM). | ||
|
||
# Alternatives | ||
[alternatives]: #alternatives | ||
|
||
- Do nothing. Rust works well without it (there's the issue mentioned in the "Motivation" section though). `SmallVec`s | ||
work well enough and have the added benefit of limiting stack usage. | ||
|
||
- `mem::with_alloc<T, F: Fn([T]) -> U>(elems: usize, code: F) -> U` This has the benefit of reducing API surface, and | ||
introducing rightwards drift, which makes it more unlikely to be used too much. However, it also needs to be | ||
monomorphized for each function (instead of only for each target type), which will increase compile times. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is the only solution that will really work, since otherwise you can always use the |
||
|
||
- dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only | ||
fully known at runtime, requiring complex type system gymnastics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The generalization of this features is to allow unsized locals (which simply allocate stack space for the size given by the memory which they are moved out of) To prevent accidental usage of this, a language item |
||
|
||
- use a macro instead of a function (analogous to `print!(..)`), which could insert the LLVM alloca builtin. | ||
|
||
- mark the function as `unsafe` due to the potential stack overflowing problem. | ||
|
||
- Copy the design from C `fn alloca()`, possibly wrapping it later. | ||
|
||
- Use escape analysis to determine which allocations could be moved to the stack. This could potentially benefit even | ||
more programs, because they would benefit from increased allocation speed without the need for change. The deal-breaker | ||
here is that we would also lose control to avoid the listed drawback, making programs crash without recourse. Also the | ||
compiler would become somewhat more complex (though a simple incomplete escape analysis implementation is already in | ||
[clippy](https://github.com/Manishearth/rust-clippy). | ||
|
||
# Unresolved questions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. C's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that that limitation exists primarily to allow naive single pass compilers to exist (along with some "interesting" ways of implementing |
||
[unresolved]: #unresolved-questions | ||
|
||
- Could we return the slice directly (reducing visible complexity)? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I may have missed it, but I'm not sure I understand why we wouldn't be able to just return the slice? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My guess: Because the size of the returned value could be variable and would need to be copied (or preserved on the stack). Not sure rust has anything that does that right now. Suppose it could be done, could just become a detail of the calling convention. Can't have caller allocate unless it knows how much memory to allocate, and making it aware of the memory needed by the callee could complicate things (and would likely be impossible to do in some cases without running the callee twice). If these stack allocations were parameterized with type-level numbers, it would be fairly straight forward (ignoring all the general complexity of type level numbers), but this RFC doesn't propose that. |
||
|
||
- Bikeshedding: Can we find a better name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite see why the type of
reserve
would preventStackSlice
to escape the scope wherereserve
is called. More precisely, It seems like the user could define:Which would be invalid if it is not inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the signatures proposed here is completely unsound. Since
'a
is a lifetime parameter not constrained by anything, any call toreserve
can simply pick'a = 'static
.