Zero allocation futures #437

leoyvens · 2017-04-03T22:43:48Z

This PR shows that it's possible to have zero (extra) allocation futures, by generalizing Arc to Unpark + Clone + 'static. Unpark no longer requires Sync and this opens way to a bare bones version of tasks that work on no_std.

Gets rid of Arc in the poll methods, taking &Unpark + Clone instead. ~~replacing with a UnparkHandle~~. Since trait objects can't be Clone this implements a custom trait object that is Clone by limiting the maximum size of the object. Upon park, instead of giving Arc references we clone the contents of the UnparkHandle, this generalizes the behaviour of cloning the Arc.

We need to figure out the intersection with the overhaul in #436. I think this is compatible with the extension to Unpark porposed, but doing both things may be overkill since they are both targeted at reducing the number of allocations in the tokio/futures interface. I now think this generalizes #436, see below. Hopefully this is compatible with the UnparkContext stuff in #436 though I haven't really taken a look at that part.

There is unsafe in the implementation of the trait object, I think it's correct and I got it to not crash any tests, but the more eyes on it the better!

The downsides is that this is a (small) breaking change and there are subtle interactions around inner mutability. The limitation on the size of the type may also be annoying. But you can always wrap your struct in an Arc and get exactly the same semantics we have today.

Of course the real motivation of this PR is to surprise @alexcrichton as per #432.

Fixes #430. Fixes #432. Supercedes #433.

Rufflewind · 2017-04-04T03:32:45Z

Unpark no longer requires Sync

Does this also open up a way to implement single-threaded executors without the penalty of atomic operations?

leoyvens · 2017-04-04T15:13:15Z

@Rufflewind This opens way to no_std tasks regardless of the Sync requirement, those generalizations independently fallout of no hard dependency on Arc.

It depends on what your single threaded executor looks like. Executors commonly need to share values with inner mutability. This PR allows you to, for example, clone a reference to a static mut instead of an Arc, and not care about synchronizing the access to the static mut because you are in a single thread. But this is only safe if you know unpark won't be called from multiple threads.

Making a special version of Unpark and Task that aren't Send may be easier after this PR, but it's not obvious if we want that.

alexcrichton · 2017-04-04T19:06:38Z

Thanks for the PR!

I feel though that having restrictions on the size of types is a little draconian, would it be possible to perhaps relax that? My thinking is that the only reason we require std today is b/c Arc is hardcoded, but could we perhaps rely on a trait and raw pointer to avoid hard coding that? I'm thinking something like:

trait Unpark: Send {
    fn unpark(&self);
    fn clone(&self) -> *mut Unpark;
    unsafe fn drop(&mut self);
}

And that way the vtable is buried in *mut Unpark automatically and we just be sure to call the drop portion at the appropriate time (and clone when creating a new handle).

@Rufflewind that will be a very difficult restriction to get rid of unfortunately, a Task is today Send + Sync, and I think we'll always want it to be at least Send

leoyvens · 2017-04-04T20:13:39Z

@alexcrichton Agree the restriction sucks. We could auto-fallback to Arc, with the downside that we need two UnparkHandle constructors, one where the user doesn't care because Arc doesn't change his semantics, and another where he cares and wants to decide for himself between Arc or no Arc.
Edit: When we get a const system we can at least move the "type too large" error from panic to compilation error.

Let me see if I follow. In your suggestion clone returns a pointer to the heap. It does solve tokio's problem. We may technically drop Sync and go no_std. We lose the zero allocation motivation, which no_std may need in practice. The safety of the whole thing relies on clone being implemented correctly. I worry about pushing unsafety concerns onto the implementor of Unpark.

leoyvens · 2017-04-04T21:54:06Z

I also need to make UnparkHandle not Sync and therefore Task not Sync, had forgotten that! Meanwhile I pushed a commit to simplify things a bit by taking the unpark by value in UnparkHandle::new.

alexcrichton · 2017-04-12T18:41:15Z

I personally feel that between this and #436 the latter (#436) has more potential in terms of being a route forward. In #436 the task system is generalized to only require one Arc per executor rather than per future. (that's done by passing around ids). That seems more flexible to me and achieving similar goals without having the downside of placing a dynamic restriction on the size of types passed in perhaps?

carllerche · 2017-04-12T18:58:59Z

I think I will try to hide the Arc from the public API though (as mentioned in the PR) to future proof.

leoyvens · 2017-04-13T21:55:18Z

~~Agree #436 is better for many use cases in particular tokio.~~ I think this is a generalization of #436, see comment below. If we do this in addition to #436 we the upsides:

Task in no_std.
Support for non-Sync unparks.
Possible optimization for executors that can't take advantage of the index approach or that need non-u64 indexes.

Downsides:

API complexity with Clone vs Arc semantics and the weird type size thing.
Task is no longer Sync.

I don't have concrete use cases for the upsides. ~~I understand if this is closed for being undermotivated considering #436 is better works fine l for the concrete use case at hand which is tokio.~~ But I'm unconvinced on #436 being clearly better than this. Of course we still want UnparkContext from #436 which seems orthogonal.

leoyvens · 2017-04-15T21:15:40Z

Actually, I've realized this is a generalization of #436. Consider the following usage of the #436 API:

// core is an Arc<Core>, idx is an u64
spawn.poll_future(&core, idx)

Is equivalent to the following with the API in this PR:

struct IndexedUnpark {
    exec : Arc<Core>,
    index : u64
}

impl Clone for IndexedUnpark { /* ref_inc behaviour goes here */ } 
impl Drop for IndexedUnpark { /* ref_dec behaviour goes here */ }
impl Unpark for IndexedUnpark { /* unpark by index behaviour goes here */ }

// When polling a spawn
let unpark = UnparkHandle::new(IndexedUnpark { exec : core.clone(), index : idx }); 
spawn.poll_future(&unpark);

We don't need the ref_inc(&self, id: u64) or ref_dec(&self, id: u64) because that behaviour goes in Clone and Drop respectively. Rather than hardcoding u64 indexes, we allow indexes of any type to be used. The Unpark trait dosen't need any extra methods or arguments.

Hardcoding max type size in this PR may feel like a hack, but #436 also has its clunkiness with harcoded u64 indexes and the ref_inc and ref_dec methods. If I'm correct #436 ends up being a specialized version of this.

Get rid of `Arc` in the `poll` methods, replacing with a `UnparkHandle`. Implement a custom trait object that can clone itself by limiting the maximum size of the object. Upon `park`, instead of giving `Arc` references we clone the contents of the `UnparkHandle`, this generalizes the behaviour of cloning the `Arc`.

Simplifies implementation and usage of `UnparkHandle`

…ot Sync. MAX_UNPARK_BYTES is now exposed and configurable at compile-time by setting the FUTURES_MAX_UNPARK_BYTES env var. UnparkHandle is marked non-Sync therefore Task is non-Sync.

leoyvens · 2017-04-16T17:42:17Z

I've made the maximum size of the unpark configurable through an env var (at compile-time) using standard build.rs trickery.

This is more correct because we take the maximum between the MAX_UNPARK_BYTES set by the dependencies and the app.

leoyvens · 2017-04-17T18:45:04Z

Changed it to use cargo features instead of env var + build.rs. This is more correct because it uses the maximum of the values set by dependencies and root. Downside is we need a feature for each supported value.

leoyvens · 2017-04-18T13:22:21Z

I now realize this has potential to do unaligned loads. I think it's fixable though.
Edit: Fixed.

burdges · 2017-04-18T15:02:38Z

src/task_impl/unpark_handle.rs

@@ -1,5 +1,5 @@
 use core::ptr;
-use core::mem::{forget, size_of};
+use core::mem::{self, forget, size_of};


What does importing self do?

use core::mem::{self} is equivalent to use core::mem.

leoyvens · 2017-04-19T18:13:21Z

The max size default and configurations are arbitrary right now, but thanks to @camlorn we have data to make an informed decison. I did a simple analysis over this collection of type sizes from rustc and the ecosystem. Disregarding types smaller than 8 bytes, we get the following numbers:

25% of the types are exactly 8 bytes.
50% of the types are under 24 bytes.
75% are under 32 bytes.
90% are under 64 bytes.
97% are under 128 bytes.
99% are under 256 bytes.

So 64 looks like an appropriate default. Supporting 128 and 256 as features makes sense. Any larger than that is unlikely to ever be needed. We can always add more features if someone comes up with an use case for sizes larger than 256.

leoyvens · 2017-04-25T21:04:00Z

I've changed things to hide UnparkHandle from the public API, instead taking &Unpark + Clone such as in:
fn poll_future<U : Unpark + Clone>(&mut self, unpark: &U)
This is less verbose and more straightforward to read and use. It is also better for performance, if nothing else it saves cloning the Arc around, being better at that than #433. This is now less of a breaking change, this is even less of a breaking change than #433.

I also moved the 'static bound to Unpark for convenience, so currently Unpark: Send + 'static.

Hide UnparkHandle from the public API. Take &Unpark + Clone in poll methods of Spawn. Move the 'static bound to the Unpark trait. Also remove MaxUnparkBytes512 and MaxUnparkBytes1024, it's unlikely that someone will use those.

leoyvens · 2017-04-26T17:33:15Z

I caught up with @carllerche on gitter today to get some feedback on this, which he kindly provided!

I was unable to convince him that this API is nicer and he was unable to convince me that the alternative API is nicer. This API exposes no methods that are unsafe to implement while the alternative avoids unsafe in the internals. In practice the executors may need unsafe even with this API, we would have to try out and see if this can really reduce the overall amount of unsafe. Also in this API the user dosen't have to create a trait object which avoids a clone when using Arc. In rayon for example, I guess we could at least kill fn make_unpark and the unpark field of contents, see here. If you are reading this and you've written an executor, I'd be curious to know what you think!

However, he gave me one argument that kills the approach in this PR. A goal that I was not aware of is to get the size of Task to a minimum, because it gets moved around so much. We could make the buffer size configurable down to a minimum of say 16 bytes, but still we have the vtable that is 32 bytes right now, so UnparkObj takes up quite a bit of stack space and is bloated in comparison to the alternative. I don't know any way to get the vtable out of UnparkObj (the compiler can put it in a global, unfortunately we cannot). So if Task bloat is a problem it's a dead end for this PR.

I had a lot of fun and learned a lot implementing and discussing this, thanks to everyone who gave feedback!

leoyvens force-pushed the zero-allocation-futures branch 2 times, most recently from 0399bf3 to cf2fd8c Compare April 4, 2017 02:03

leoyvens force-pushed the zero-allocation-futures branch 2 times, most recently from 4512bf3 to 898e6db Compare April 16, 2017 17:11

leoyvens added 2 commits April 16, 2017 14:22

Take the unpark by value in UnparkHandle

dd341f1

Simplifies implementation and usage of `UnparkHandle`

leoyvens force-pushed the zero-allocation-futures branch from 898e6db to 61fb69c Compare April 16, 2017 17:26

Make MAX_UNPARK_BYTES configurable and expose it. Make UnparkHandle n…

0f12f1d

…ot Sync. MAX_UNPARK_BYTES is now exposed and configurable at compile-time by setting the FUTURES_MAX_UNPARK_BYTES env var. UnparkHandle is marked non-Sync therefore Task is non-Sync.

leoyvens force-pushed the zero-allocation-futures branch from 61fb69c to 0f12f1d Compare April 16, 2017 17:33

Use cargo features instead of env var to configure MAX_UNPARK_BYTES

987b7f4

This is more correct because we take the maximum between the MAX_UNPARK_BYTES set by the dependencies and the app.

Fix unaligned memory reads

d588ffc

burdges reviewed Apr 18, 2017

View reviewed changes

leoyvens force-pushed the zero-allocation-futures branch from cf06946 to 8910b62 Compare April 25, 2017 23:18

Take &Unpark + Clone in poll methods

ba1182c

Hide UnparkHandle from the public API. Take &Unpark + Clone in poll methods of Spawn. Move the 'static bound to the Unpark trait. Also remove MaxUnparkBytes512 and MaxUnparkBytes1024, it's unlikely that someone will use those.

leoyvens force-pushed the zero-allocation-futures branch from 8910b62 to ba1182c Compare April 26, 2017 13:13

leoyvens closed this Apr 26, 2017

Zero allocation futures #437

Zero allocation futures #437

Uh oh!

Conversation

leoyvens commented Apr 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rufflewind commented Apr 4, 2017

Uh oh!

leoyvens commented Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Apr 4, 2017

Uh oh!

leoyvens commented Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leoyvens commented Apr 4, 2017

Uh oh!

alexcrichton commented Apr 12, 2017

Uh oh!

carllerche commented Apr 12, 2017

Uh oh!

leoyvens commented Apr 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leoyvens commented Apr 15, 2017

Uh oh!

leoyvens commented Apr 16, 2017

Uh oh!

leoyvens commented Apr 17, 2017

Uh oh!

leoyvens commented Apr 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burdges Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

leoyvens Apr 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leoyvens commented Apr 19, 2017

Uh oh!

leoyvens commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leoyvens commented Apr 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leoyvens commented Apr 3, 2017 •

edited

Loading

leoyvens commented Apr 4, 2017 •

edited

Loading

leoyvens commented Apr 4, 2017 •

edited

Loading

leoyvens commented Apr 13, 2017 •

edited

Loading

leoyvens commented Apr 18, 2017 •

edited

Loading

leoyvens Apr 18, 2017 •

edited

Loading

leoyvens commented Apr 25, 2017 •

edited

Loading

leoyvens commented Apr 26, 2017 •

edited

Loading