Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alloca for Rust #1808

Closed
wants to merge 8 commits into from
Closed

Alloca for Rust #1808

wants to merge 8 commits into from

Conversation

llogiq
Copy link
Contributor

@llogiq llogiq commented Dec 6, 2016

[drawbacks]: #drawbacks

- Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space
leads to segfaults at best and undefined behavior at worst. On unices, the stack can usually be extended at runtime,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we not be able to correctly probe the stack for space when alloca-ing?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them.

Speaking of which, are stack probes still not actually in place for regular stack allocations? Just tried compiling a test program (that allocates a large array on the stack) with rustc nightly on my system and didn't see any probes in the asm output.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them.

Can't one already overflow the stack with recursion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, which is why stack probes are inserted so that the runtime can detect that and abort with a fatal runtime error: stack overflow message rather than just run off into whatever memory is below.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except they actually aren't, since it hasn't been implemented yet (or maybe it only works on Windows?). OSes normally provide a 4kb guard page below the stack, so most stack overflows will crash anyway (and trigger that error, which comes from a signal handler), but a function with a big enough stack frame can skip past that page, and I think it may actually be possible in practice to exploit some Rust programs that way... I should try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on integrating stack probes, which are required for robustness on our embedded system (ARTIQ). I expect to implement them in a way generic enough to be used on all bare-metal platforms, including no-MMU, as well as allow for easy shimming on currently unsupported OSes.

# Unresolved questions
[unresolved]: #unresolved-questions

- Could we return the slice directly (reducing visible complexity)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed it, but I'm not sure I understand why we wouldn't be able to just return the slice?

Copy link

@codyps codyps Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess:

Because the size of the returned value could be variable and would need to be copied (or preserved on the stack). Not sure rust has anything that does that right now. Suppose it could be done, could just become a detail of the calling convention. Can't have caller allocate unless it knows how much memory to allocate, and making it aware of the memory needed by the callee could complicate things (and would likely be impossible to do in some cases without running the callee twice).

If these stack allocations were parameterized with type-level numbers, it would be fairly straight forward (ignoring all the general complexity of type level numbers), but this RFC doesn't propose that.

compiler would become somewhat more complex (though a simple incomplete escape analysis implementation is already in
[clippy](https://github.com/Manishearth/rust-clippy).

# Unresolved questions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C's alloca can't be used inside of a function argument list - would we need the same restriction or would we handle that properly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that that limitation exists primarily to allow naive single pass compilers to exist (along with some "interesting" ways of implementing alloca()). I don't think that concern would apply to rust.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2016

Using a function for this would be a mistake IMO. It's not a true function in C either, for various reasons.

&[x; n] producing &[T] for runtime values of n is more appealing from an implementation perspective.
That is, LLVM's stack variables (its alloca instruction, which is poorly named but whatever) have an optional "dynamic arity", which can be considered to be 1 for regular variables.

This model is closer to VLAs than alloca, and it could work well with box [x; n] too (which would evaluate n first, allocate, then initialize the slice with copies of x, and produce Box<[T]>, Rc<[T]> etc).

@steveklabnik
Copy link
Member

I do know some people are very wary of alloca/vlas in rust, but I can't remember who....

@burdges
Copy link

burdges commented Dec 7, 2016

It's worth being careful to avoid accidentally considering any constant n as runtime at the same time though, so fixing issues like rust-lang/rust#34344 and worrying about more complex constant expressions. As otherwise Foo<&[x; n]> becomes messy and fixing it risks breakage.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2016

@burdges We already have an analysis which treats rust-lang/rust#34344 correctly - the problem there is the evaluator (which is AST-based and pre-typeck, this will change next year hopefully), not the qualifier.

A simpler and far more problematic example would be something like if false { 0 } else { 1 } - and I have to agree here, as much as I'd like box [x; n] for runtime n (even w/o VLAs).

@eddyb
Copy link
Member

eddyb commented Dec 7, 2016

@steveklabnik That might've been @cmr.

@withoutboats withoutboats added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Dec 7, 2016
# Drawbacks
[drawbacks]: #drawbacks

- Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the flip side, this might reduce stack usage for users of ArrayVec and those manually allocating overly-large arrays on the stack (I occasionally do this when reading small files).

@japaric
Copy link
Member

japaric commented Dec 7, 2016

Could we get an example of how one might use this reserve function in the RFC? Because I can't tell just from its signature. It basically returns a &[T] so you can't mutate it? (unless it's supposed to only be used with Cell types?). And it seems to me that function lets you read unitialized memory (mem::reserve::<Droppable>(2)[0]) but it's not marked as unsafe?

```

`StackSlice`'s embedded lifetime ensures that the stack allocation may never leave its scope. Thus the borrow checker
can uphold the contract that LLVM's `alloca` requires.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see why the type of reserve would prevent StackSlice to escape the scope where reserve is called. More precisely, It seems like the user could define:

fn reserve_and_zero<'a>(elements : usize) -> &'a[u32] {
    let s = *reserve(usize);
    for x in s.iter_mut() {  *x = 0 }
    return s
}

Which would be invalid if it is not inlined.

Copy link

@hanna-kruppe hanna-kruppe Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the signatures proposed here is completely unsound. Since 'a is a lifetime parameter not constrained by anything, any call to reserve can simply pick 'a = 'static.


- `mem::with_alloc<T, F: Fn([T]) -> U>(elems: usize, code: F) -> U` This has the benefit of reducing API surface, and
introducing rightwards drift, which makes it more unlikely to be used too much. However, it also needs to be
monomorphized for each function (instead of only for each target type), which will increase compile times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only solution that will really work, since otherwise you can always use the reserve intrinsic to create a slice with a 'static lifetime

monomorphized for each function (instead of only for each target type), which will increase compile times.

- dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only
fully known at runtime, requiring complex type system gymnastics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generalization of this features is to allow unsized locals (which simply allocate stack space for the size given by the memory which they are moved out of)

To prevent accidental usage of this, a language item StackBox<T: !Sized> could be created, which alloca's all its memory depending on the size_of_val. The problem is that this would need to run code at moving, which is unprecedented in Rust

@eddyb
Copy link
Member

eddyb commented Dec 7, 2016

@burdges Oh I remember now, it's just that it's been a while since I thought about this.
The solution is to only do this if the expected type is &(mut) [T] to begin with and still require constants otherwise. This is one of the rarer cases where type ascription makes a noticeable difference, I suppose.

@tomaka
Copy link

tomaka commented Dec 7, 2016

SmallVecs work well enough and have the added benefit of limiting stack usage.

As I have already said in an issue somewhere, the assembly generated by SmallVec is very disappointing and saying that it "works well enough" is not really true.

For example this Rust code:

extern {
    fn foo(data: *const u32);
}

pub fn test() {
    let mut a: SmallVec<[u32; 4]> = SmallVec::new();
    a.push(5);
    a.push(12);
    a.push(94);
    a.push(1);

    unsafe { foo(a.as_ptr()) }
}

...generates the assembly below:
(compiled with cargo rustc --release -- --emit=asm with the 2016-11-05 nightly and smallvec v0.3.1)

	pushq	%rbp
	subq	$80, %rsp
	leaq	80(%rsp), %rbp
	movq	$-2, -8(%rbp)
	xorps	%xmm0, %xmm0
	movups	%xmm0, -36(%rbp)
	movaps	%xmm0, -48(%rbp)
	leaq	-48(%rbp), %rcx
	movl	$5, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$12, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$94, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$1, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	cmpl	$1, -40(%rbp)
	leaq	-36(%rbp), %rcx
	cmoveq	-32(%rbp), %rcx
	callq	foo
	cmpl	$1, -40(%rbp)
	jne	.LBB4_5
	movq	-24(%rbp), %rdx
	testq	%rdx, %rdx
	je	.LBB4_8
	movq	-32(%rbp), %rcx
	shlq	$2, %rdx
	movl	$4, %r8d
	callq	__rust_deallocate
	jmp	.LBB4_8
	leaq	-32(%rbp), %rax
	movl	$1, -40(%rbp)
	xorps	%xmm0, %xmm0
	movups	%xmm0, (%rax)
	addq	$80, %rsp
	popq	%rbp
	retq

And this is the _ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E function that is called four times:

	pushq	%r15
	pushq	%r14
	pushq	%rsi
	pushq	%rdi
	pushq	%rbp
	pushq	%rbx
	subq	$40, %rsp
	movl	%edx, %r14d
	movq	%rcx, %rsi
	movl	8(%rsi), %ebx
	cmpl	$1, %ebx
	movl	$4, %ebp
	cmoveq	24(%rsi), %rbp
	movq	(%rsi), %rax
	cmpq	%rbp, %rax
	jne	.LBB1_1
	leaq	(%rbp,%rbp), %rax
	cmpq	$1, %rax
	movl	$1, %r15d
	cmovaq	%rax, %r15
	cmpq	%r15, %rbp
	ja	.LBB1_11
	movl	$4, %ecx
	movq	%r15, %rax
	mulq	%rcx
	jo	.LBB1_12
	movl	$1, %edi
	testq	%rax, %rax
	je	.LBB1_6
	movl	$4, %edx
	movq	%rax, %rcx
	callq	__rust_allocate
	movq	%rax, %rdi
	testq	%rdi, %rdi
	je	.LBB1_13
	leaq	12(%rsi), %rdx
	cmpl	$1, %ebx
	cmoveq	16(%rsi), %rdx
	shlq	$2, %rbp
	movq	%rdi, %rcx
	movq	%rbp, %r8
	callq	memcpy
	cmpl	$1, 8(%rsi)
	jne	.LBB1_9
	movq	24(%rsi), %rdx
	testq	%rdx, %rdx
	je	.LBB1_9
	movq	16(%rsi), %rcx
	shlq	$2, %rdx
	movl	$4, %r8d
	callq	__rust_deallocate
	movl	$1, 8(%rsi)
	movq	%rdi, 16(%rsi)
	movq	%r15, 24(%rsi)
	movq	(%rsi), %rax
	jmp	.LBB1_10
	leaq	12(%rsi), %rdi
	cmpl	$1, %ebx
	cmoveq	16(%rsi), %rdi
	movl	%r14d, (%rdi,%rax,4)
	incq	(%rsi)
	addq	$40, %rsp
	popq	%rbx
	popq	%rbp
	popq	%rdi
	popq	%rsi
	popq	%r14
	popq	%r15
	retq

(I stripped out the labels in order to make it more readable).

As you can see this is really suboptimal, both in terms of performances (number of instructions executed) and binary size. As this point I'm even wondering if using a Vec is not faster.
Instead a solution that uses alloca would just do a sub esp followed with four mov and you're done. Five instructions.

If it remained to be proven, this is for me the main motivation for having alloca in Rust.

@whitequark
Copy link
Member

SmallVecs work well enough and have the added benefit of limiting stack usage.

I am unconvinced this is a benefit. When properly implemented, alloca is safe, extremely cheap, has nice locality and aliasing guarantees and even avoids an extern crate alloc;. In functional-ish languages you usually have a generational GC where the young heap has most of these nice properties but I do not really expect to see one in Rust any time soon, much less an efficient implementation.

@burdges
Copy link

burdges commented Dec 7, 2016

Interesting @eddyb so fn g() { let mut x : &[T] = &[T::new(); f()]; ... } should not compile without the : &[T] if the ... only uses x for scratch space and does not return it or anything?

Actually I suppose it'd eventually compile if f() is constant somehow.

@llogiq
Copy link
Contributor Author

llogiq commented Dec 7, 2016

Thanks everyone for the great comments! I'm going to follow @oli-obk 's advice and change to the closure based version. Need some time to do it, though... 😄

@eddyb
Copy link
Member

eddyb commented Dec 7, 2016

@llogiq I'm trying to tell you that anything using functions is unnecessarily suboptimal from an implementation perspective. The feature LLVM has is effectively VLAs. clang implements alloca(n) as something like &((char[n]) {})[0] (not sure if that literally works in C. char a[n]; in any case).

if the ... only uses x for scratch space and does not return it or anything

@burdges I'm not sure what you mean here, you can never return a stack local and rvalue promotion is backwards compatible in that expanding the set of constants only allows more code to compile.

@whitequark
Copy link
Member

@llogiq Apart from @eddyb's concerns, which I also support, using a closure will make this feature significantly less useful. For example then you could not call alloca in a loop while accumulating the results, which I would like to rely on for writing zero-heap-allocation parsers.

Also, in general I am opposed to deliberately decreasing usability in arbitrary ways because of your (or anyone else's) subjective opinion of whether it is "considered bad". You are not making a tradeoff with some other feature, as it usually goes in language design, you are simply making my life harder for no reason at all.

@whitequark
Copy link
Member

Now, a constructive suggestion: it looks like there is no existing way to reflect the lifetime of alloca exactly, and I agree with other commenters that introducing VLAs seems like an excessively invasive change for a fairly niche use case. How about making it a built-in macro?

let n = 10; for i in 1..n { let ary: &[u8] = alloca![0; n]; }

This is nicely similar to the existing case of vec![] and I believe it could return a value with any lifetime it wants, especially given that in this case the lifetime is simply fully lexical (the entire containing function).

@eddyb
Copy link
Member

eddyb commented Dec 8, 2016

@whitequark While that could work, doing it with an intrinsic is more problematic than having it first-class.

Hmm, we can make the macro encapsulate the unsafety by using a helper function that takes a reference to some other stack local and equates the lifetimes, only problem being that I'm not sure how to create something like that which lives long enough to be useful. May need to abuse some of the rvalue lifetime rules to get it.

EDIT: I suppose there's always the possibility of making the intrinsic generic over a lifetime that the compiler restricts artificially.

@whitequark
Copy link
Member

I suppose there's always the possibility of making the intrinsic generic over a lifetime that the compiler restricts artificially.

Yes, that's what I was thinking about. The intrinsic would merely translate to alloca of the right type directly, and the lifetime will be restricted early (you could do it right after you introduce lifetimes into the AST, really). I think this is a very nonintrusive solution.

@llogiq
Copy link
Contributor Author

llogiq commented Dec 8, 2016

@eddyb Ok, so it would need to be a compiler-internal macro. That's a neat solution to both the lifetime problem and the problem of extending the compiler with as little fuss as possible.

One thing I should probably add – because this is low-level, performance-über-alles, we'd likely not want to initialize the memory, so we could expand to mem::uninitialized(), rigging the type as necessary, but the whole thing would be unsafe and should be marked as such. I'm not too deep into macros; do we have unsafe macros?

@briansmith
Copy link

briansmith commented Dec 8, 2016

One thing I should probably add – because this is low-level, performance-über-alles, we'd likely not want to initialize the memory, so we could expand to mem::uninitialized(), rigging the type as necessary, but the whole thing would be unsafe and should be marked as such. I'm not too deep into macros; do we have unsafe macros?

No, this isn't necessarily the case. There are different motivations for using alloca: performance, avoiding using the heap (because there might not be one), maybe others. I want to use this alloca facility but I still zero the memory first and in particular I don't want to have to use unsafe for it as avoiding unsafe is a key selling point for what I'm buliding.

I'd suggest instead that the feature should allow the user to somehow specify the value to fill the memory with. Then the user could pass unsafe { mem::uninitialized() } if it doesn't want the memory to be zeroed and if it is OK with using unsafe, or it could use 0u32 or whatever the element type is, if it wants it to be zeroed.

@rustonaut
Copy link

rustonaut commented Dec 8, 2016

I agree with @briansmith on the safety aspect. I think that a safe interface should be provided as main interface with a additional way to have it uninitialized.

Through passing in mem::uninitalized() might not work depending on how it is normally initialized. (initializing through a closure would call mem::uninitalized() ntimes and copy the result to the place in the stack (should get optimized away, I guess??) and initializing with Clone would try to clone uninitialized memory, lastly initializing with Copy might be to restrictive...)

So maybe alloc![...] and unsafe alloc_unitialized![...].

@comex
Copy link

comex commented Dec 9, 2016

Ideally the initializing version would take an iterator so it could be initialized with something other than copies of the same value.

@whitequark
Copy link
Member

whitequark commented Feb 23, 2017

@aturon Worth mentioning that these RFCs are not equivalent. Unsized rvalues aren't equivalent to alloca and aren't useful for the embedded use cases. That said I imagine #1909, in some form, will be a prerequisite for an eventual implementation of alloca.

@joshtriplett
Copy link
Member

@whitequark Can you elaborate on cases that this would solve but VLAs wouldn't?

@whitequark
Copy link
Member

@joshtriplett Allocating in a loop.

@briansmith
Copy link

unnsized rvalues aren't equivalent to alloca and aren't useful for the embedded use cases

This is overstating things a bit. There's one particular thing that you want to do, which that proposal (nor the current form of this one, IIUC), doesn't support. It's not clear that that particular thing really needs to be supported, and that thing doesn't comprise all or even most “embedded” use cases.

However, I think there is a more serious problem with #1909, and maybe also this RFC: The proposed syntax for VLAs hint that they can be used intuitively like arrays [T; n] and vectors Vec<T>, but the restrictions on their usage are quite severe and counterintuitive. This RFC (#1808), reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct { x: [T] }) and giving the variables slice types, e.g. x: &[i32] = &[0i32; dynamic n]; makes more sense to me than #1909 does.

@plietar
Copy link

plietar commented Feb 23, 2017

reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct { x: [T] })

This is already supported AFAIK. The last field of a struct can be a DST, and is itself thus a DST.

@briansmith
Copy link

briansmith commented Feb 23, 2017

reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct Foo { x: [T] })

This is already supported AFAIK. The last field of a struct can be a DST, and is itself thus a DST.

I understand that. But, adding support for stack-allocating values of such structs is, IMO, counterproductive, because of the severe limitations mentioned in #1909: They can't be passed by value (copied/moved) nor returned from functions. Given those limitations, is stack allocation of these types widely useful? Those limitations are dealbreakers for my code. Maybe other people have other use cases for which they'd work, in which case it would be useful to see them.

@joshtriplett
Copy link
Member

@whitequark Hmmm. You could certainly allocate a VLA in a loop. But as @briansmith notes, you can't copy/move them, which means you can't accumulate them in a loop.

Fair point. That seems worth considering in the discussions between this issue and #1909.

@arielb1
Copy link
Contributor

arielb1 commented Feb 24, 2017

@whitequark

Accumulating alloca (aka 'fn locals) is quite separate in its problems, uses and implications from VLAs (aka this RFC and #1909). Please move that discussion to #1847. If you come up with a sufficiently organized proposal, we might adopt it (or I might write up such a proposal myself when I have the time).

@arielb1
Copy link
Contributor

arielb1 commented Feb 24, 2017

@briansmith

With #1909, you can definitely move rvalues (e.g., you should be able to call Box::new on a local). If we adopt the "unsized ADT expressions" option, you could also put unsized values in structs. If we adopt the Copy: ?Sized option, you could also copy rvalues.

@briansmith
Copy link

briansmith commented Feb 24, 2017

With #1909, you can definitely move rvalues (e.g., you should be able to call Box::new on a local). If we adopt the "unsized ADT expressions" option, you could also put unsized values in structs. If we adopt the Copy: ?Sized option, you could also copy rvalues.

That sounds better than I understood from the discussion. Could we return them from functions? In particular, given #[derive(Clone, Copy)] struct Elem { value: [usize] }, can we follow idiomatic patterns like this?:

`#[derive(Clone, Copy)]
struct Elem {
    value: [usize]
}

impl Elem {
    pub fn new(...) -> Self { ... }
}

pub fn elem_product(x: &Elem, y: &Elem, m: &Modulus) -> Elem { ... }

pub fn elem_pow(base: Elem, e: i32, m: &Modulus) -> Elem {
    let mut table = [Elem { value: [0; dynamic base.value.len()] }; 5];
    ...
}

If VLAs look like arrays then people will expect all of that kind of stuff to work, especially since it works fine for arrays. If that stuff doesn't work then a syntax and semantics that looks and feels more like "alloca(), but safe and Rustic" makes more sense, IMO—especially if the reason for deferring returning VLAs from functions or having arrays of VLA-containing structures are implementation concerns (difficulty modifying LLVM to support them).

@arielb1
Copy link
Contributor

arielb1 commented Feb 24, 2017

There's also a possible way of returning them from functions, but the ABI implications are... quite interesting in a possibly non-portable way (either the called function must allocate stack on the caller's stack, or the caller needs to access memory after the called function returned). #[derive] is probably not going to work because of that, but if your types are Copy you can implement them yourself.

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

1 similar comment
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Feb 28, 2017
@leonardo-m
Copy link

It's nice to see Rust take more care of stack allocation. A VLA is usable as brick to build more complex stack-allocated data structures.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 10, 2017

The final comment period is now complete.

@llogiq
Copy link
Contributor Author

llogiq commented Mar 11, 2017

It is not yet clear to me whether #1909 or this RFC is the way forward, as both seem incomplete to me.

@burdges
Copy link

burdges commented Mar 11, 2017

Ain't clear if you can avoid some incompleteness here since it'd be nice to future-proof against some rather far out stuff.

As an extreme example, [x; n] could make sense with n being defined at runtime by using dependent types ala #1930 (comment) In essence, our runtime value n could itself be viewed as type level information, but n viewed as an immutable value must outlive n viewed as a type, and n viewed as a type must outlive any type involving it, like [T; n]. If you have fn foo<n: usize>() -> [u8; n] then n gets promoted to a type in the caller which creates the space with alloca and passes the buffer to foo along with n as an implicit argument.

We clearly want stuff like raw-ish alloca well before anything like this. And I think the syntax let x : &[T] = &[T::default(); n]; is actually forwards compatible with "promote n to a type, make a [T; n], and cast its reference to a slice". I thought this made a nice example though.

@Ericson2314
Copy link
Contributor

@budges yeah that's roughly what we eventually want in principle, but there's so many subtleties in the design I'd be extremely wary about declaring something forward comparable now.

@nikomatsakis
Copy link
Contributor

Per the FCP, I am going to close this RFC, in favor of unsized rvalues. Thanks @llogiq for the work you put into it. =)

@burdges
Copy link

burdges commented May 13, 2018

Is it worth experimenting with making dyn Trait : Sized when the compiler can statically determine all T : Trait? It could provides auto-generated sum types #2414 that address many iterator/future adapter situations and even allow returning function when only Sized types arise their bodies:

fn foo(a: bool) -> impl FnOnce() {
    if a {
        || { ... } as dyn FnOnce()
    } else {
        || { ... } as dyn FnOnce() 
    }
}

@eternaleye
Copy link

@burdges: If a dependency then impls Trait on an unsized type, then that's a semver break. This would basically make unsized types implicitly #[fundamental], as adding an impl may now force semver breaks on dependents.

"If none do X, then Y" - negative reasoning - is a thorny thicket to dive into.

@burdges
Copy link

burdges commented May 13, 2018

Oops, I misspoke, but you read into my mistake more sense than existed. ;) There is no way the compiler could statically determine all types T satisfying FnOnce()!

It requires way more hackery to make this make sense:

fn foo(a: bool) -> impl FnOnce() {
    let f = || { ... };
    let g = || { ... };
    trait MyFn : FnOnce() {}
    impl MyFn for type_of<f> {}
    impl MyFn for type_of<g> {}
    if a {
        f as dyn MyFn
    } else {
        g as dyn MyFn
    }
}

In this code, we know all T satiating MyFn because no other crate can impl MyFn, so the compiler really could declare dyn MyFn : Sized, but..

I can now answer my own question: We lack a convenient enough way to impl traits for anonymous types to make this an ergonomic solution for any problems under consideration. If the calling convention ever allows returning VLAs then a much better feature falls out naturally without the hackery. If that calling convention incurs any runtime costs, then the compiler could compile appropriate cases to sized returns under the hood.

@eternaleye
Copy link

eternaleye commented May 13, 2018

@burdges: It's more that I then applied the same reasoning to dyn Trait in argument position, which I failed to adequately state. That issue applies even to your expansion. Sizedness is a matter of API, and implementing a Foo trait on struct Bar { a: [u8] } would thus be a "take-back" of all impls of Foo being sized. If Foo and Bar are in different crates, this can get very messy.

@burdges
Copy link

burdges commented May 14, 2018

A trait exported from any crate could never benefit. I read you point two ways: First, adding impls can now break things within the same crate. Second, even if Trait is not exported, we might still impl Trait for Foreign where Foreign comes from another crate, and hence Foreign might stop being sized in future. Yes, you'd need an explicit bound, ala

trait MyFn : FnOnce()+Sized {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.