From 168857dd09e4919e88a13b1820dcbbf143dedec2 Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Wed, 7 Dec 2016 00:20:03 +0100 Subject: [PATCH 1/8] new RFC: Alloca for Rust --- text/0000-alloca.md | 147 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 text/0000-alloca.md diff --git a/text/0000-alloca.md b/text/0000-alloca.md new file mode 100644 index 00000000000..16cf723a774 --- /dev/null +++ b/text/0000-alloca.md @@ -0,0 +1,147 @@ +- Feature Name: alloca +- Start Date: 2016-12-01 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Add a builtin `fn core::mem::reserve<'a, T>(elements: usize) -> StackSlice<'a, T>` that reserves space for the given +number of elements on the stack and returns a `StackSlice<'a, T>` to it which derefs to `&'a [T]`. + +# Motivation +[motivation]: #motivation + +Some algorithms (e.g. sorting, regular expression search) need a one-time backing store for a number of elements only +known at runtime. Reserving space on the heap always takes a performance hit, and the resulting deallocation can +increase memory fragmentation, possibly slightly degrading allocation performance further down the road. + +If Rust included this zero-cost abstraction, more of these algorithms could run at full speed – and would be available +on systems without an allocator, e.g. embedded, soft-real-time systems. The option of using a fixed slice up to a +certain size and using a heap-allocated slice otherwise (as afforded by +[SmallVec](https://crates.io/crates/smallvec)-like classes) has the drawback of decreasing memory locality if only a +small part of the fixed-size allocation is used – and even those implementations could potentially benefit from the +increased memory locality. + +As a (flawed) benchmark, consider the following C program: + +```C +#include + +int main(int argc, char **argv) { + int n = argc > 1 ? atoi(argv[0]) : 1; + int x = 1; + char foo[n]; + foo[n - 1] = 1; +} +``` + +Running `time nice -n 20 ionice ./dynalloc 1` returns almost instantly (0.0001s), whereas using `time nice -n 20 ionice +./dynalloc 200000` takes 0.033 seconds. As such, it appears that just by forcing the second write further away from the +first slows down the program (this benchmark is actually completely unfair, because by reducing the process' priority, +we invite the kernel to swap in a different process instead, which is very probably the major cause of the slowdown +here). + +Still, even with the flaws in this benchmark, +[The Myth of RAM](http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-part-i) argues quite convincingly for the +benefits of memory frugality. + + +# Detailed design +[design]: #detailed-design + +The standard library function can simply `panic!(..)` within the `reserve(_)` method, as it will be replaced when +translating to MIR. The `StackSlice` type can be implemented as follows: + +```Rust +/// A slice of data on the stack +pub struct StackSlice<'a, T: 'a> { + slice: &'a [T], +} + +impl<'a, T: 'a> Deref for StackSlice<'a, T> { + type Target = [T]; + + fn deref(&self) -> &[T] { + return self.slice; + } +} +``` + +`StackSlice`'s embedded lifetime ensures that the stack allocation may never leave its scope. Thus the borrow checker +can uphold the contract that LLVM's `alloca` requires. + +MIR Level: We need a way to represent the dynamic stack `alloca`tion with both the number of elements and the concrete +type of elements. Then while building the MIR, we need to replace the `Calls` from HIR with it. + +Low-level: LLVM has the `alloca` instruction to allocate memory on the stack. We simply need to extend trans to emit it +with a dynamic `` argument when encountering the aforementioned MIR. + +With a LLVM extension to un-allocate the stack slice we could even restrict the stack space reservation to the lifetime +of the allocated value, thus increasing locality over C code that uses alloca (which so far is suboptimally implemented +by some compilers, especially with regards to inlining). + +# How to teach this + +Add the following documentation to libcore: + +``` +*** WARNING *** stay away from this feature unless you absolutely need it. +Using it will destroy your ability to statically reason about stack size. + +Apart from that, this works much like an unboxed array, except the size is +determined at runtime. Since the memory resides on the stack, be careful +not to exceed the stack limit (which depends on your operating system), +otherwise the resulting stack overflow will at best kill your program. You +have been warned. + +Valid uses for this is mostly within embedded system without heap allocation. +``` + +Also add an example (perhaps a sort algorithm that uses some scratch space that will be heap-allocacted with `std` and +stack-allocated with `#[no_std]` (noting that the function would not be available on no-std systems at all were it not +for this feature). + +Do not `pub use` it from `std::mem` to drive the point home. + +# Drawbacks +[drawbacks]: #drawbacks + +- Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space +leads to segfaults at best and undefined behavior at worst. On unices, the stack can usually be extended at runtime, +whereas on Windows stack size is set at link time (default to 1MB). + +- Adding this will increase implementation complexity and require support from possible alternative implementations / +backends (e.g. Cretonne, WebASM). + +# Alternatives +[alternatives]: #alternatives + +- Do nothing. Rust works well without it (there's the issue mentioned in the "Motivation" section though). `SmallVec`s +work well enough and have the added benefit of limiting stack usage. + +- `mem::with_alloc U>(elems: usize, code: F) -> U` This has the benefit of reducing API surface, and +introducing rightwards drift, which makes it more unlikely to be used too much. However, it also needs to be +monomorphized for each function (instead of only for each target type), which will increase compile times. + +- dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only +fully known at runtime, requiring complex type system gymnastics. + +- use a macro instead of a function (analogous to `print!(..)`), which could insert the LLVM alloca builtin. + +- mark the function as `unsafe` due to the potential stack overflowing problem. + +- Copy the design from C `fn alloca()`, possibly wrapping it later. + +- Use escape analysis to determine which allocations could be moved to the stack. This could potentially benefit even +more programs, because they would benefit from increased allocation speed without the need for change. The deal-breaker +here is that we would also lose control to avoid the listed drawback, making programs crash without recourse. Also the +compiler would become somewhat more complex (though a simple incomplete escape analysis implementation is already in +[clippy](https://github.com/Manishearth/rust-clippy). + +# Unresolved questions +[unresolved]: #unresolved-questions + +- Could we return the slice directly (reducing visible complexity)? + +- Bikeshedding: Can we find a better name? From ba65ef31dfe1affcdc5a1c6e3f55c811f3fe222c Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Tue, 13 Dec 2016 12:54:59 +0100 Subject: [PATCH 2/8] rewrote design section based on recent comments --- text/0000-alloca.md | 117 +++++++++++++++++++++++++------------------- 1 file changed, 66 insertions(+), 51 deletions(-) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index 16cf723a774..0a3e75cc998 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -6,8 +6,9 @@ # Summary [summary]: #summary -Add a builtin `fn core::mem::reserve<'a, T>(elements: usize) -> StackSlice<'a, T>` that reserves space for the given -number of elements on the stack and returns a `StackSlice<'a, T>` to it which derefs to `&'a [T]`. +Add a builtin `alloca!(type, number_of_elements)` macro that reserves space for the given number of elements of type +`T` on the stack and returns a slice over the reserved memory. The memories' lifetime is artifically restricted to the +current function's scope, so the borrow checker can ensure that the memory is no longer used when the method returns. # Motivation [motivation]: #motivation @@ -46,63 +47,73 @@ Still, even with the flaws in this benchmark, [The Myth of RAM](http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-part-i) argues quite convincingly for the benefits of memory frugality. - # Detailed design [design]: #detailed-design -The standard library function can simply `panic!(..)` within the `reserve(_)` method, as it will be replaced when -translating to MIR. The `StackSlice` type can be implemented as follows: +There are a few constraints we have to keep: First, we want to allow for mostly free usage of the memory, while keeping +borrows to it limited to the current function's scope – this makes it possible to use it in a loop, increasing its +usefulness. The macro should include a check in debug mode to ensure the stack limit is not exceeded. Actually, it +should arguably check this in release mode, too (which would be feasible without giving up performance using stack +probes, which have not been available from LLVM despite being hailed LLVM's preferred solution to stack overflow +problems), but writing this within Rustc would duplicate work that LLVM is poised to do anyway. -```Rust -/// A slice of data on the stack -pub struct StackSlice<'a, T: 'a> { - slice: &'a [T], -} +This feature would be available via a builtin macro `stack!(..)` taking any of the following arguments: -impl<'a, T: 'a> Deref for StackSlice<'a, T> { - type Target = [T]; +- `stack![x; ]` reserves an area large enough for *num* (where num is an expression evaluating to a `usize`) `x` +instances on the stack, fills it with `x` and returns a slice to it; this requires that `x` be of a `Copy`able type - fn deref(&self) -> &[T] { - return self.slice; - } -} -``` +- `stack![x, y, z, ..]` (analogous to `vec![..]`). This is not actually needed as current arrays do mostly the same +thing, but will likely reduce the number of frustrated users -`StackSlice`'s embedded lifetime ensures that the stack allocation may never leave its scope. Thus the borrow checker -can uphold the contract that LLVM's `alloca` requires. +- `stack![for ]` (where iter is an expression that returns an `std::iter::ExactSizeIterator`) -MIR Level: We need a way to represent the dynamic stack `alloca`tion with both the number of elements and the concrete -type of elements. Then while building the MIR, we need to replace the `Calls` from HIR with it. +- `unsafe { stack![Ty * num] }` reserves an uninitialized area large enough for *num* elements of the given type `Ty`, +giving people seeking performance a cheap dynamically sized scratch space for their algorithms -Low-level: LLVM has the `alloca` instruction to allocate memory on the stack. We simply need to extend trans to emit it -with a dynamic `` argument when encountering the aforementioned MIR. +All variants return a slice to the reserved stack space which will live until the end of the current function (same as +C's `alloca(..)` builtin). Because this is a compiler-builtin, we can make use of the type of the values in determining +the type of the expression, so we don't need to restate the type (unless it's not available, as in the unsafe version). -With a LLVM extension to un-allocate the stack slice we could even restrict the stack space reservation to the lifetime -of the allocated value, thus increasing locality over C code that uses alloca (which so far is suboptimally implemented -by some compilers, especially with regards to inlining). +The macro will expand to a newly introduced `DynArray{ ty: Ty, num: Expr }` `rustc::hir::ExprKind` variant (plus some +exertions to put the values in the reserved space, depending on variant) that will be mapped to an `alloca` operation +in MIR and LLVM IR. The type of the expression will be rigged in HIR to have a lifetime until the function body ends. -# How to teach this +Te iterator version will return a shorter slice than reserved if the iterator returns `None` early. SHould the iterator +panic, the macro will `forget(_)` all values inserted so far and re-raise the panic. -Add the following documentation to libcore: +If the macro is invoked with unsuitable input (e.g. `stack![Ty]`, `stack![]`, etc., it should at least report an error +outlining the valid modes of operation. If we want to improve the ergonomics, we could try to guess which one the user +has actually attempted and offer a suggestion to that effect. -``` -*** WARNING *** stay away from this feature unless you absolutely need it. -Using it will destroy your ability to statically reason about stack size. +Translating the MIR to LLVM bytecode will produce the corresponding `alloca` operation with the given type and number +expression. -Apart from that, this works much like an unboxed array, except the size is -determined at runtime. Since the memory resides on the stack, be careful -not to exceed the stack limit (which depends on your operating system), -otherwise the resulting stack overflow will at best kill your program. You -have been warned. +# How we teach this +[teaching]: #how-we-teach-this + +The doc comments for the macro should contain text like the following: -Valid uses for this is mostly within embedded system without heap allocation. -``` -Also add an example (perhaps a sort algorithm that uses some scratch space that will be heap-allocacted with `std` and -stack-allocated with `#[no_std]` (noting that the function would not be available on no-std systems at all were it not -for this feature). +```Rust +/// *** WARNING *** stay away from this feature unless you absolutely need it. +/// Using it will destroy your ability to statically reason about stack size. +/// +/// Apart from that, this works much like an unboxed array, except the size is +/// determined at runtime. Since the memory resides on the stack, be careful +/// not to exceed the stack limit (which depends on your operating system), +/// otherwise the resulting stack overflow will at best kill your program. You +/// have been warned. +/// +/// Valid uses for this is mostly within embedded system without heap allocation +/// to claim some scratch space for algorithms, e.g. in sorting, traversal, etc. +/// +/// This macro has four modes of operation: +/// .. +``` -Do not `pub use` it from `std::mem` to drive the point home. +The documentation should be sufficient to explain the use of the feature. Also the book should be extended with +examples of all modes of operation. Once stabilized, the release log should advertise the new feature. Blogs will rave +about it, trumpets will chime, and the world will be a little brighter than before. # Drawbacks [drawbacks]: #drawbacks @@ -111,6 +122,12 @@ Do not `pub use` it from `std::mem` to drive the point home. leads to segfaults at best and undefined behavior at worst. On unices, the stack can usually be extended at runtime, whereas on Windows stack size is set at link time (default to 1MB). +- With this functionality, we lose the ability to statically reason about stack space. Worse, since it can be used to +reserve space arbitrarily, it can blow past the guard page that operating systems usually employ to secure programs +against stack overflow. Hilarity ensues. However, it can be argued that static stack reservations (e.g. +`let _ = [0u64; 9999999999];` already suffices to do this. Perhaps someone should write a lint against this. It +certainly won't be allowed in MISRA Rust, if such a thing ever happens to come into existence. + - Adding this will increase implementation complexity and require support from possible alternative implementations / backends (e.g. Cretonne, WebASM). @@ -118,20 +135,18 @@ backends (e.g. Cretonne, WebASM). [alternatives]: #alternatives - Do nothing. Rust works well without it (there's the issue mentioned in the "Motivation" section though). `SmallVec`s -work well enough and have the added benefit of limiting stack usage. - -- `mem::with_alloc U>(elems: usize, code: F) -> U` This has the benefit of reducing API surface, and -introducing rightwards drift, which makes it more unlikely to be used too much. However, it also needs to be -monomorphized for each function (instead of only for each target type), which will increase compile times. +work well enough and have the added benefit of limiting stack usage. Except, no, they turn into hideous assembly that +makes you wonder if using a `Vec` wouldn't have been the better option. - dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only fully known at runtime, requiring complex type system gymnastics. -- use a macro instead of a function (analogous to `print!(..)`), which could insert the LLVM alloca builtin. +- use a function instead of a macro. This would be more complex for essentially no gain. -- mark the function as `unsafe` due to the potential stack overflowing problem. +- mark the use of the macro as `unsafe` regardless of values given due to the potential stack overflowing problem. -- Copy the design from C `fn alloca()`, possibly wrapping it later. +- Copy the design from C `fn alloca()`, possibly wrapping it later. This doesn't work in Rust because the returned +slice could leave the scope, giving rise to unsoundness. - Use escape analysis to determine which allocations could be moved to the stack. This could potentially benefit even more programs, because they would benefit from increased allocation speed without the need for change. The deal-breaker @@ -142,6 +157,6 @@ compiler would become somewhat more complex (though a simple incomplete escape a # Unresolved questions [unresolved]: #unresolved-questions -- Could we return the slice directly (reducing visible complexity)? +- Is the feature as defined above ergonomic? Should it be? - Bikeshedding: Can we find a better name? From 9ad827c6b96698201dec32d0bd3f707fc134e164 Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Wed, 14 Dec 2016 19:06:21 +0100 Subject: [PATCH 3/8] rename to alloca\!, added rewritten comment from whitequark, some stack probe verbiage --- text/0000-alloca.md | 63 ++++++++++++++++++++++++++++----------------- 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index 0a3e75cc998..6c9e7a7bce7 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -57,17 +57,17 @@ should arguably check this in release mode, too (which would be feasible without probes, which have not been available from LLVM despite being hailed LLVM's preferred solution to stack overflow problems), but writing this within Rustc would duplicate work that LLVM is poised to do anyway. -This feature would be available via a builtin macro `stack!(..)` taking any of the following arguments: +This feature would be available via a builtin macro `alloca!(..)` taking any of the following arguments: -- `stack![x; ]` reserves an area large enough for *num* (where num is an expression evaluating to a `usize`) `x` +- `alloca![x; ]` reserves an area large enough for *num* (where num is an expression evaluating to a `usize`) `x` instances on the stack, fills it with `x` and returns a slice to it; this requires that `x` be of a `Copy`able type -- `stack![x, y, z, ..]` (analogous to `vec![..]`). This is not actually needed as current arrays do mostly the same +- `alloca![x, y, z, ..]` (analogous to `vec![..]`). This is not actually needed as current arrays do mostly the same thing, but will likely reduce the number of frustrated users -- `stack![for ]` (where iter is an expression that returns an `std::iter::ExactSizeIterator`) +- `alloca![for ]` (where iter is an expression that returns an `std::iter::ExactSizeIterator`) -- `unsafe { stack![Ty * num] }` reserves an uninitialized area large enough for *num* elements of the given type `Ty`, +- `unsafe { alloca![Ty * num] }` reserves an uninitialized area large enough for *num* elements of the given type `Ty`, giving people seeking performance a cheap dynamically sized scratch space for their algorithms All variants return a slice to the reserved stack space which will live until the end of the current function (same as @@ -79,15 +79,19 @@ exertions to put the values in the reserved space, depending on variant) that wi in MIR and LLVM IR. The type of the expression will be rigged in HIR to have a lifetime until the function body ends. Te iterator version will return a shorter slice than reserved if the iterator returns `None` early. SHould the iterator -panic, the macro will `forget(_)` all values inserted so far and re-raise the panic. +panic, all values inserted so far will be dropped. This makes it useful for things like file descriptors, where the +drop implementation carries out additional cleanup tasks. -If the macro is invoked with unsuitable input (e.g. `stack![Ty]`, `stack![]`, etc., it should at least report an error +If the macro is invoked with unsuitable input (e.g. `alloca![Ty]`, `alloca![]`, etc., it should at least report an error outlining the valid modes of operation. If we want to improve the ergonomics, we could try to guess which one the user has actually attempted and offer a suggestion to that effect. Translating the MIR to LLVM bytecode will produce the corresponding `alloca` operation with the given type and number expression. +Because LLVM currently lacks the ability to insert stack probes, the safety of this feature cannot be guaranteed. It is +thus advisable to keep this feature unstable until Rust has a working stack probe implementation. + # How we teach this [teaching]: #how-we-teach-this @@ -95,17 +99,23 @@ The doc comments for the macro should contain text like the following: ```Rust -/// *** WARNING *** stay away from this feature unless you absolutely need it. -/// Using it will destroy your ability to statically reason about stack size. -/// -/// Apart from that, this works much like an unboxed array, except the size is -/// determined at runtime. Since the memory resides on the stack, be careful -/// not to exceed the stack limit (which depends on your operating system), -/// otherwise the resulting stack overflow will at best kill your program. You -/// have been warned. -/// -/// Valid uses for this is mostly within embedded system without heap allocation -/// to claim some scratch space for algorithms, e.g. in sorting, traversal, etc. +/// **Warning:** the Rust runtime currently does not reliably check for +/// stack overflows. Use of this feature, even in safe code, may result in +/// undefined behavior and exploitable bugs. Until the Rust runtime is fixed, +/// do not use this feature unless you understand the implications extremely +/// well. +/// +/// The `stack!` macro works much like an unboxed array, except the size +/// is determined at runtime. The allocated memory resides on the thread stack; +/// when allocating, be careful not to exceed the size of the stack, or +/// the *entire process* will crash. The stack size of the main thread +/// is operating system dependent, and stack size of newly spawned threads +/// can be set using `std::thread::Builder::stack_size`. +/// +/// The `stack!` macro is primarily useful on embedded systems where heap +/// allocation is either impossible or too costly, where it can be used +/// to obtain scratch space for algorithms, e.g. in sorting, traversal, +/// parsing, etc. /// /// This macro has four modes of operation: /// .. @@ -119,17 +129,20 @@ about it, trumpets will chime, and the world will be a little brighter than befo [drawbacks]: #drawbacks - Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space -leads to segfaults at best and undefined behavior at worst. On unices, the stack can usually be extended at runtime, -whereas on Windows stack size is set at link time (default to 1MB). +leads to segfaults at best and undefined behavior at worst (at least until the aforementioned stack probes are in +place). On unices, the stack can usually be extended at runtime, whereas on Windows main thread stack size is set at +link time (default to 1MB). The `thread::Builder` API has a method to set the stack size for spawned threads, however. - With this functionality, we lose the ability to statically reason about stack space. Worse, since it can be used to reserve space arbitrarily, it can blow past the guard page that operating systems usually employ to secure programs -against stack overflow. Hilarity ensues. However, it can be argued that static stack reservations (e.g. -`let _ = [0u64; 9999999999];` already suffices to do this. Perhaps someone should write a lint against this. It -certainly won't be allowed in MISRA Rust, if such a thing ever happens to come into existence. +against stack overflow. Hilarity ensues. However, it can be argued that static stack reservations (e.g. `let _ = [0u64; +9999999999];`) already suffices to do this. Perhaps someone should write a +[clippy](https://github.com/Manishearth/rust-clippy) lint against this. It certainly won't be allowed in MISRA Rust, if +such a thing ever happens to come into existence. - Adding this will increase implementation complexity and require support from possible alternative implementations / -backends (e.g. Cretonne, WebASM). +backends (e.g. MIRI, Cretonne, WebASM). However, as all of them have C frontend support, they'll want to implement such +a feature anyway. # Alternatives [alternatives]: #alternatives @@ -159,4 +172,6 @@ compiler would become somewhat more complex (though a simple incomplete escape a - Is the feature as defined above ergonomic? Should it be? +- How do we deal with the current lack of stack probes? + - Bikeshedding: Can we find a better name? From 0d8462fb911534cf45569e8655e541c1878bb6ab Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Wed, 14 Dec 2016 19:20:31 +0100 Subject: [PATCH 4/8] incorporated more of whitequark's suggestions --- text/0000-alloca.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index 6c9e7a7bce7..5fc60920067 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -18,8 +18,8 @@ known at runtime. Reserving space on the heap always takes a performance hit, an increase memory fragmentation, possibly slightly degrading allocation performance further down the road. If Rust included this zero-cost abstraction, more of these algorithms could run at full speed – and would be available -on systems without an allocator, e.g. embedded, soft-real-time systems. The option of using a fixed slice up to a -certain size and using a heap-allocated slice otherwise (as afforded by +on systems without an allocator, e.g. embedded, soft- or hard-real-time systems. The option of using a fixed slice up +to a certain size and using a heap-allocated slice otherwise (as afforded by [SmallVec](https://crates.io/crates/smallvec)-like classes) has the drawback of decreasing memory locality if only a small part of the fixed-size allocation is used – and even those implementations could potentially benefit from the increased memory locality. @@ -105,14 +105,14 @@ The doc comments for the macro should contain text like the following: /// do not use this feature unless you understand the implications extremely /// well. /// -/// The `stack!` macro works much like an unboxed array, except the size +/// The `alloca!` macro works much like an unboxed array, except the size /// is determined at runtime. The allocated memory resides on the thread stack; /// when allocating, be careful not to exceed the size of the stack, or /// the *entire process* will crash. The stack size of the main thread /// is operating system dependent, and stack size of newly spawned threads /// can be set using `std::thread::Builder::stack_size`. /// -/// The `stack!` macro is primarily useful on embedded systems where heap +/// The `alloca!` macro is primarily useful on embedded systems where heap /// allocation is either impossible or too costly, where it can be used /// to obtain scratch space for algorithms, e.g. in sorting, traversal, /// parsing, etc. @@ -133,12 +133,10 @@ leads to segfaults at best and undefined behavior at worst (at least until the a place). On unices, the stack can usually be extended at runtime, whereas on Windows main thread stack size is set at link time (default to 1MB). The `thread::Builder` API has a method to set the stack size for spawned threads, however. -- With this functionality, we lose the ability to statically reason about stack space. Worse, since it can be used to -reserve space arbitrarily, it can blow past the guard page that operating systems usually employ to secure programs -against stack overflow. Hilarity ensues. However, it can be argued that static stack reservations (e.g. `let _ = [0u64; -9999999999];`) already suffices to do this. Perhaps someone should write a -[clippy](https://github.com/Manishearth/rust-clippy) lint against this. It certainly won't be allowed in MISRA Rust, if -such a thing ever happens to come into existence. +- With this functionality, trying to statically reason about stack usage, even in an approximate way, gains a new +degree of complexity, as maximum stack depth now depends not only on control flow alone, which can sometimes be +predictable, but also on arbitrary computations. It certainly won't be allowed in MISRA Rust, if such a thing ever +happens to come into existence. - Adding this will increase implementation complexity and require support from possible alternative implementations / backends (e.g. MIRI, Cretonne, WebASM). However, as all of them have C frontend support, they'll want to implement such From ac6b65f2e77bd23f222e9c12f3ed60ca50e45c33 Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Thu, 15 Dec 2016 19:52:31 +0100 Subject: [PATCH 5/8] Clarify macro location --- text/0000-alloca.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index 5fc60920067..2ee949fb963 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -74,6 +74,8 @@ All variants return a slice to the reserved stack space which will live until th C's `alloca(..)` builtin). Because this is a compiler-builtin, we can make use of the type of the values in determining the type of the expression, so we don't need to restate the type (unless it's not available, as in the unsafe version). +The macro should live in `core::mem` and be reexported from `std::mem` and may be imported in the prelude. + The macro will expand to a newly introduced `DynArray{ ty: Ty, num: Expr }` `rustc::hir::ExprKind` variant (plus some exertions to put the values in the reserved space, depending on variant) that will be mapped to an `alloca` operation in MIR and LLVM IR. The type of the expression will be rigged in HIR to have a lifetime until the function body ends. From 64a5da30b77cd7f028b35da1b1addf768f410bb1 Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Tue, 20 Dec 2016 08:22:11 +0100 Subject: [PATCH 6/8] added lifetime ascription/dynamic arrays as alternatives --- text/0000-alloca.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index 2ee949fb963..83fdb678a0a 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -144,6 +144,10 @@ happens to come into existence. backends (e.g. MIRI, Cretonne, WebASM). However, as all of them have C frontend support, they'll want to implement such a feature anyway. +- The special type *and* lifetime couples two different concerns together, which may trip up people trying to follow +the code. Alternative designs like lifetime ascription and dynamic arrays would keep them apart, leading to a more +elegant, orthogonal design. + # Alternatives [alternatives]: #alternatives @@ -152,7 +156,13 @@ work well enough and have the added benefit of limiting stack usage. Except, no, makes you wonder if using a `Vec` wouldn't have been the better option. - dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only -fully known at runtime, requiring complex type system gymnastics. +fully known at runtime. It would be possible (and indeed not straying too far from this proposal) to allow for the +syntax `[t; n]` (with two expressions) for dynamically sized arrays. The type of those arrays would be `[T]: !Sized`, +that is without known size at runtime. Also those types would still be bound by their scope lifetime, unless... + +- lifetime ascription is the idea that we use labels as lifetimes (they are denoted the same anyway) and allow them +also on plain blocks (as in `'foo: { .. }`). This gives us a way to easily extend the lifetime of a value, unify the +concepts of labels and lifetimes and also be an awesome teaching device. - use a function instead of a macro. This would be more complex for essentially no gain. From b09ffce3a19c6ec1624a268d55902ef3e9cc25e3 Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Tue, 3 Jan 2017 07:40:16 +0100 Subject: [PATCH 7/8] reduced scope to VLAs --- text/0000-alloca.md | 113 ++++++++++---------------------------------- 1 file changed, 25 insertions(+), 88 deletions(-) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index 83fdb678a0a..fed0d218591 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -6,9 +6,7 @@ # Summary [summary]: #summary -Add a builtin `alloca!(type, number_of_elements)` macro that reserves space for the given number of elements of type -`T` on the stack and returns a slice over the reserved memory. The memories' lifetime is artifically restricted to the -current function's scope, so the borrow checker can ensure that the memory is no longer used when the method returns. +Add variable-length arrays to the language. # Motivation [motivation]: #motivation @@ -50,46 +48,20 @@ benefits of memory frugality. # Detailed design [design]: #detailed-design -There are a few constraints we have to keep: First, we want to allow for mostly free usage of the memory, while keeping -borrows to it limited to the current function's scope – this makes it possible to use it in a loop, increasing its -usefulness. The macro should include a check in debug mode to ensure the stack limit is not exceeded. Actually, it -should arguably check this in release mode, too (which would be feasible without giving up performance using stack -probes, which have not been available from LLVM despite being hailed LLVM's preferred solution to stack overflow -problems), but writing this within Rustc would duplicate work that LLVM is poised to do anyway. +So far, the `[T]` type could not be constructed in valid Rust code. It will now represent compile-time unsized (also +known as "variable-length") arrays. The syntax to construct them could simply be `[t; n]` where `t` is a valid value of +the type (or `mem::uninitialized`) and `n` is an expression whose result is of type `usize`. Type ascription can be used +to disambiguate cases where the type could either be `[T]` or `[T; n]` for some value of `n`. -This feature would be available via a builtin macro `alloca!(..)` taking any of the following arguments: +The AST for the unsized array will be simply `syntax::ast::ItemKind::Repeat(..)`, but removing the assumption that the +second expression is a constant value. The same applies to `rustc::hir::Expr_::Repeat(..)`. -- `alloca![x; ]` reserves an area large enough for *num* (where num is an expression evaluating to a `usize`) `x` -instances on the stack, fills it with `x` and returns a slice to it; this requires that `x` be of a `Copy`able type - -- `alloca![x, y, z, ..]` (analogous to `vec![..]`). This is not actually needed as current arrays do mostly the same -thing, but will likely reduce the number of frustrated users - -- `alloca![for ]` (where iter is an expression that returns an `std::iter::ExactSizeIterator`) - -- `unsafe { alloca![Ty * num] }` reserves an uninitialized area large enough for *num* elements of the given type `Ty`, -giving people seeking performance a cheap dynamically sized scratch space for their algorithms - -All variants return a slice to the reserved stack space which will live until the end of the current function (same as -C's `alloca(..)` builtin). Because this is a compiler-builtin, we can make use of the type of the values in determining -the type of the expression, so we don't need to restate the type (unless it's not available, as in the unsafe version). - -The macro should live in `core::mem` and be reexported from `std::mem` and may be imported in the prelude. - -The macro will expand to a newly introduced `DynArray{ ty: Ty, num: Expr }` `rustc::hir::ExprKind` variant (plus some -exertions to put the values in the reserved space, depending on variant) that will be mapped to an `alloca` operation -in MIR and LLVM IR. The type of the expression will be rigged in HIR to have a lifetime until the function body ends. - -Te iterator version will return a shorter slice than reserved if the iterator returns `None` early. SHould the iterator -panic, all values inserted so far will be dropped. This makes it useful for things like file descriptors, where the -drop implementation carries out additional cleanup tasks. - -If the macro is invoked with unsuitable input (e.g. `alloca![Ty]`, `alloca![]`, etc., it should at least report an error -outlining the valid modes of operation. If we want to improve the ergonomics, we could try to guess which one the user -has actually attempted and offer a suggestion to that effect. +Type inference should – in the best case – apply the sized type where applicable, only resorting to the unsized type +where necessary to fulfil the requirements. We could implement traits like `IntoIterator` for unsized arrays, which +may allow us to improve the ergonomics of arrays in general. Translating the MIR to LLVM bytecode will produce the corresponding `alloca` operation with the given type and number -expression. +expression. It will also require alignment inherent to the type (which is done via a third argument). Because LLVM currently lacks the ability to insert stack probes, the safety of this feature cannot be guaranteed. It is thus advisable to keep this feature unstable until Rust has a working stack probe implementation. @@ -97,35 +69,12 @@ thus advisable to keep this feature unstable until Rust has a working stack prob # How we teach this [teaching]: #how-we-teach-this -The doc comments for the macro should contain text like the following: - - -```Rust -/// **Warning:** the Rust runtime currently does not reliably check for -/// stack overflows. Use of this feature, even in safe code, may result in -/// undefined behavior and exploitable bugs. Until the Rust runtime is fixed, -/// do not use this feature unless you understand the implications extremely -/// well. -/// -/// The `alloca!` macro works much like an unboxed array, except the size -/// is determined at runtime. The allocated memory resides on the thread stack; -/// when allocating, be careful not to exceed the size of the stack, or -/// the *entire process* will crash. The stack size of the main thread -/// is operating system dependent, and stack size of newly spawned threads -/// can be set using `std::thread::Builder::stack_size`. -/// -/// The `alloca!` macro is primarily useful on embedded systems where heap -/// allocation is either impossible or too costly, where it can be used -/// to obtain scratch space for algorithms, e.g. in sorting, traversal, -/// parsing, etc. -/// -/// This macro has four modes of operation: -/// .. -``` +We need to extend the book to cover the distinction between sized and unsized arrays and especially the cases where +type ascription is required. Having good error messages in case of type error around the sizedness of arrays will also +help people to learn the correct use of the feature. -The documentation should be sufficient to explain the use of the feature. Also the book should be extended with -examples of all modes of operation. Once stabilized, the release log should advertise the new feature. Blogs will rave -about it, trumpets will chime, and the world will be a little brighter than before. +WHile stack probes remain unimplemented on some platforms, the documentation for this feature should warn of possible +dire consequences of stack overflow. # Drawbacks [drawbacks]: #drawbacks @@ -141,13 +90,9 @@ predictable, but also on arbitrary computations. It certainly won't be allowed i happens to come into existence. - Adding this will increase implementation complexity and require support from possible alternative implementations / -backends (e.g. MIRI, Cretonne, WebASM). However, as all of them have C frontend support, they'll want to implement such +backends (e.g. MIRI, Cretonne, WebASM). However, as all of them have C frontend support, they'll need to implement such a feature anyway. -- The special type *and* lifetime couples two different concerns together, which may trip up people trying to follow -the code. Alternative designs like lifetime ascription and dynamic arrays would keep them apart, leading to a more -elegant, orthogonal design. - # Alternatives [alternatives]: #alternatives @@ -155,20 +100,15 @@ elegant, orthogonal design. work well enough and have the added benefit of limiting stack usage. Except, no, they turn into hideous assembly that makes you wonder if using a `Vec` wouldn't have been the better option. -- dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only -fully known at runtime. It would be possible (and indeed not straying too far from this proposal) to allow for the -syntax `[t; n]` (with two expressions) for dynamically sized arrays. The type of those arrays would be `[T]: !Sized`, -that is without known size at runtime. Also those types would still be bound by their scope lifetime, unless... +- make the result's lifetime function-scope bound (which is what C's `alloca()` does). This is mingling two concerns +together that should be handled separately. A `'fn` lifetime will be however suggested in a sibling RFC. -- lifetime ascription is the idea that we use labels as lifetimes (they are denoted the same anyway) and allow them -also on plain blocks (as in `'foo: { .. }`). This gives us a way to easily extend the lifetime of a value, unify the -concepts of labels and lifetimes and also be an awesome teaching device. +- use a special macro or function to initialize the arrays. Both seem like hacks compared to the suggested syntax. -- use a function instead of a macro. This would be more complex for essentially no gain. +- mark the use of unsized arrays as `unsafe` regardless of values given due to the potential stack overflowing problem. +The author of this RFC does not deem this necessary if the feature gate is documented with a stern warning. -- mark the use of the macro as `unsafe` regardless of values given due to the potential stack overflowing problem. - -- Copy the design from C `fn alloca()`, possibly wrapping it later. This doesn't work in Rust because the returned +- Copy the design from C `alloca()`, possibly wrapping it later. This doesn't work in Rust because the returned slice could leave the scope, giving rise to unsoundness. - Use escape analysis to determine which allocations could be moved to the stack. This could potentially benefit even @@ -180,8 +120,5 @@ compiler would become somewhat more complex (though a simple incomplete escape a # Unresolved questions [unresolved]: #unresolved-questions -- Is the feature as defined above ergonomic? Should it be? - -- How do we deal with the current lack of stack probes? - -- Bikeshedding: Can we find a better name? +- does the MIR need to distinguish between arrays of statically-known size and unsized arrays (apart from the type +information)? From c83ce1869443d264ff8282c21caadc163c6eec27 Mon Sep 17 00:00:00 2001 From: Andre Bogus Date: Fri, 6 Jan 2017 18:11:10 +0100 Subject: [PATCH 8/8] Don't rely on inference --- text/0000-alloca.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/text/0000-alloca.md b/text/0000-alloca.md index fed0d218591..86e061abafa 100644 --- a/text/0000-alloca.md +++ b/text/0000-alloca.md @@ -50,15 +50,14 @@ benefits of memory frugality. So far, the `[T]` type could not be constructed in valid Rust code. It will now represent compile-time unsized (also known as "variable-length") arrays. The syntax to construct them could simply be `[t; n]` where `t` is a valid value of -the type (or `mem::uninitialized`) and `n` is an expression whose result is of type `usize`. Type ascription can be used +the type (or `mem::uninitialized`) and `n` is an expression whose result is of type `usize`. Type ascription must be used to disambiguate cases where the type could either be `[T]` or `[T; n]` for some value of `n`. The AST for the unsized array will be simply `syntax::ast::ItemKind::Repeat(..)`, but removing the assumption that the second expression is a constant value. The same applies to `rustc::hir::Expr_::Repeat(..)`. -Type inference should – in the best case – apply the sized type where applicable, only resorting to the unsized type -where necessary to fulfil the requirements. We could implement traits like `IntoIterator` for unsized arrays, which -may allow us to improve the ergonomics of arrays in general. +Type inference must apply the sized type unless otherwise ascribed. We should implement traits like `IntoIterator` for +unsized arrays, which may allow us to improve the ergonomics of arrays in general. Translating the MIR to LLVM bytecode will produce the corresponding `alloca` operation with the given type and number expression. It will also require alignment inherent to the type (which is done via a third argument). @@ -73,7 +72,7 @@ We need to extend the book to cover the distinction between sized and unsized ar type ascription is required. Having good error messages in case of type error around the sizedness of arrays will also help people to learn the correct use of the feature. -WHile stack probes remain unimplemented on some platforms, the documentation for this feature should warn of possible +While stack probes remain unimplemented on some platforms, the documentation for this feature should warn of possible dire consequences of stack overflow. # Drawbacks @@ -108,6 +107,9 @@ together that should be handled separately. A `'fn` lifetime will be however sug - mark the use of unsized arrays as `unsafe` regardless of values given due to the potential stack overflowing problem. The author of this RFC does not deem this necessary if the feature gate is documented with a stern warning. +- allow for some type inference with regards to sizedness. This is likely to lead to surprises when some value ends up +unsized when a sized one was expected. + - Copy the design from C `alloca()`, possibly wrapping it later. This doesn't work in Rust because the returned slice could leave the scope, giving rise to unsoundness.