Skip to content

Conversation

@leoyvens
Copy link
Contributor

@leoyvens leoyvens commented Apr 3, 2017

This PR shows that it's possible to have zero (extra) allocation futures, by generalizing Arc to Unpark + Clone + 'static. Unpark no longer requires Sync and this opens way to a bare bones version of tasks that work on no_std.

Gets rid of Arc in the poll methods, taking &Unpark + Clone instead. replacing with a UnparkHandle. Since trait objects can't be Clone this implements a custom trait object that is Clone by limiting the maximum size of the object. Upon park, instead of giving Arc references we clone the contents of the UnparkHandle, this generalizes the behaviour of cloning the Arc.

We need to figure out the intersection with the overhaul in #436. I think this is compatible with the extension to Unpark porposed, but doing both things may be overkill since they are both targeted at reducing the number of allocations in the tokio/futures interface. I now think this generalizes #436, see below. Hopefully this is compatible with the UnparkContext stuff in #436 though I haven't really taken a look at that part.

There is unsafe in the implementation of the trait object, I think it's correct and I got it to not crash any tests, but the more eyes on it the better!

The downsides is that this is a (small) breaking change and there are subtle interactions around inner mutability. The limitation on the size of the type may also be annoying. But you can always wrap your struct in an Arc and get exactly the same semantics we have today.

Of course the real motivation of this PR is to surprise @alexcrichton as per #432.

Fixes #430. Fixes #432. Supercedes #433.

@leoyvens leoyvens force-pushed the zero-allocation-futures branch 2 times, most recently from 0399bf3 to cf2fd8c Compare April 4, 2017 02:03
@Rufflewind
Copy link
Contributor

Unpark no longer requires Sync

Does this also open up a way to implement single-threaded executors without the penalty of atomic operations?

@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 4, 2017

@Rufflewind This opens way to no_std tasks regardless of the Sync requirement, those generalizations independently fallout of no hard dependency on Arc.

It depends on what your single threaded executor looks like. Executors commonly need to share values with inner mutability. This PR allows you to, for example, clone a reference to a static mut instead of an Arc, and not care about synchronizing the access to the static mut because you are in a single thread. But this is only safe if you know unpark won't be called from multiple threads.

Making a special version of Unpark and Task that aren't Send may be easier after this PR, but it's not obvious if we want that.

@alexcrichton
Copy link
Member

Thanks for the PR!

I feel though that having restrictions on the size of types is a little draconian, would it be possible to perhaps relax that? My thinking is that the only reason we require std today is b/c Arc is hardcoded, but could we perhaps rely on a trait and raw pointer to avoid hard coding that? I'm thinking something like:

trait Unpark: Send {
    fn unpark(&self);
    fn clone(&self) -> *mut Unpark;
    unsafe fn drop(&mut self);
}

And that way the vtable is buried in *mut Unpark automatically and we just be sure to call the drop portion at the appropriate time (and clone when creating a new handle).

@Rufflewind that will be a very difficult restriction to get rid of unfortunately, a Task is today Send + Sync, and I think we'll always want it to be at least Send

@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 4, 2017

@alexcrichton Agree the restriction sucks. We could auto-fallback to Arc, with the downside that we need two UnparkHandle constructors, one where the user doesn't care because Arc doesn't change his semantics, and another where he cares and wants to decide for himself between Arc or no Arc.
Edit: When we get a const system we can at least move the "type too large" error from panic to compilation error.

Let me see if I follow. In your suggestion clone returns a pointer to the heap. It does solve tokio's problem. We may technically drop Sync and go no_std. We lose the zero allocation motivation, which no_std may need in practice. The safety of the whole thing relies on clone being implemented correctly. I worry about pushing unsafety concerns onto the implementor of Unpark.

@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 4, 2017

I also need to make UnparkHandle not Sync and therefore Task not Sync, had forgotten that! Meanwhile I pushed a commit to simplify things a bit by taking the unpark by value in UnparkHandle::new.

@alexcrichton
Copy link
Member

I personally feel that between this and #436 the latter (#436) has more potential in terms of being a route forward. In #436 the task system is generalized to only require one Arc per executor rather than per future. (that's done by passing around ids). That seems more flexible to me and achieving similar goals without having the downside of placing a dynamic restriction on the size of types passed in perhaps?

@carllerche
Copy link
Contributor

I think I will try to hide the Arc from the public API though (as mentioned in the PR) to future proof.

@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 13, 2017

Agree #436 is better for many use cases in particular tokio. I think this is a generalization of #436, see comment below. If we do this in addition to #436 we the upsides:

  1. Task in no_std.
  2. Support for non-Sync unparks.
  3. Possible optimization for executors that can't take advantage of the index approach or that need non-u64 indexes.

Downsides:

  1. API complexity with Clone vs Arc semantics and the weird type size thing.
  2. Task is no longer Sync.

I don't have concrete use cases for the upsides. I understand if this is closed for being undermotivated considering #436 is better works fine l for the concrete use case at hand which is tokio. But I'm unconvinced on #436 being clearly better than this. Of course we still want UnparkContext from #436 which seems orthogonal.

@leoyvens
Copy link
Contributor Author

Actually, I've realized this is a generalization of #436. Consider the following usage of the #436 API:

// core is an Arc<Core>, idx is an u64
spawn.poll_future(&core, idx)

Is equivalent to the following with the API in this PR:

struct IndexedUnpark {
    exec : Arc<Core>,
    index : u64
}

impl Clone for IndexedUnpark { /* ref_inc behaviour goes here */ } 
impl Drop for IndexedUnpark { /* ref_dec behaviour goes here */ }
impl Unpark for IndexedUnpark { /* unpark by index behaviour goes here */ }

// When polling a spawn
let unpark = UnparkHandle::new(IndexedUnpark { exec : core.clone(), index : idx }); 
spawn.poll_future(&unpark);

We don't need the ref_inc(&self, id: u64) or ref_dec(&self, id: u64) because that behaviour goes in Clone and Drop respectively. Rather than hardcoding u64 indexes, we allow indexes of any type to be used. The Unpark trait dosen't need any extra methods or arguments.

Hardcoding max type size in this PR may feel like a hack, but #436 also has its clunkiness with harcoded u64 indexes and the ref_inc and ref_dec methods. If I'm correct #436 ends up being a specialized version of this.

@leoyvens leoyvens force-pushed the zero-allocation-futures branch 2 times, most recently from 4512bf3 to 898e6db Compare April 16, 2017 17:11
Get rid of `Arc` in the `poll` methods, replacing with a
`UnparkHandle`. Implement a custom trait object that can clone itself
by limiting the maximum size of the object. Upon `park`, instead of
giving `Arc` references we clone the contents of the `UnparkHandle`,
this generalizes the behaviour of cloning the `Arc`.
Simplifies implementation and usage of `UnparkHandle`
@leoyvens leoyvens force-pushed the zero-allocation-futures branch from 898e6db to 61fb69c Compare April 16, 2017 17:26
…ot Sync.

MAX_UNPARK_BYTES is now exposed and configurable at compile-time by
setting the FUTURES_MAX_UNPARK_BYTES env var. UnparkHandle is marked
non-Sync therefore Task is non-Sync.
@leoyvens leoyvens force-pushed the zero-allocation-futures branch from 61fb69c to 0f12f1d Compare April 16, 2017 17:33
@leoyvens
Copy link
Contributor Author

I've made the maximum size of the unpark configurable through an env var (at compile-time) using standard build.rs trickery.

This is more correct because we take the maximum between the
MAX_UNPARK_BYTES set by the dependencies and the app.
@leoyvens
Copy link
Contributor Author

Changed it to use cargo features instead of env var + build.rs. This is more correct because it uses the maximum of the values set by dependencies and root. Downside is we need a feature for each supported value.

@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 18, 2017

I now realize this has potential to do unaligned loads. I think it's fixable though.
Edit: Fixed.

@@ -1,5 +1,5 @@
use core::ptr;
use core::mem::{forget, size_of};
use core::mem::{self, forget, size_of};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does importing self do?

Copy link
Contributor Author

@leoyvens leoyvens Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use core::mem::{self} is equivalent to use core::mem.

@leoyvens
Copy link
Contributor Author

The max size default and configurations are arbitrary right now, but thanks to @camlorn we have data to make an informed decison. I did a simple analysis over this collection of type sizes from rustc and the ecosystem. Disregarding types smaller than 8 bytes, we get the following numbers:

  • 25% of the types are exactly 8 bytes.
  • 50% of the types are under 24 bytes.
  • 75% are under 32 bytes.
  • 90% are under 64 bytes.
  • 97% are under 128 bytes.
  • 99% are under 256 bytes.

So 64 looks like an appropriate default. Supporting 128 and 256 as features makes sense. Any larger than that is unlikely to ever be needed. We can always add more features if someone comes up with an use case for sizes larger than 256.

@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 25, 2017

I've changed things to hide UnparkHandle from the public API, instead taking &Unpark + Clone such as in:
fn poll_future<U : Unpark + Clone>(&mut self, unpark: &U)
This is less verbose and more straightforward to read and use. It is also better for performance, if nothing else it saves cloning the Arc around, being better at that than #433. This is now less of a breaking change, this is even less of a breaking change than #433.

I also moved the 'static bound to Unpark for convenience, so currently Unpark: Send + 'static.

@leoyvens leoyvens force-pushed the zero-allocation-futures branch from cf06946 to 8910b62 Compare April 25, 2017 23:18
Hide UnparkHandle from the public API. Take &Unpark + Clone in poll
methods of Spawn. Move the 'static bound to the Unpark trait.

Also remove MaxUnparkBytes512 and MaxUnparkBytes1024, it's unlikely
that someone will use those.
@leoyvens leoyvens force-pushed the zero-allocation-futures branch from 8910b62 to ba1182c Compare April 26, 2017 13:13
@leoyvens
Copy link
Contributor Author

leoyvens commented Apr 26, 2017

I caught up with @carllerche on gitter today to get some feedback on this, which he kindly provided!

I was unable to convince him that this API is nicer and he was unable to convince me that the alternative API is nicer. This API exposes no methods that are unsafe to implement while the alternative avoids unsafe in the internals. In practice the executors may need unsafe even with this API, we would have to try out and see if this can really reduce the overall amount of unsafe. Also in this API the user dosen't have to create a trait object which avoids a clone when using Arc. In rayon for example, I guess we could at least kill fn make_unpark and the unpark field of contents, see here. If you are reading this and you've written an executor, I'd be curious to know what you think!

However, he gave me one argument that kills the approach in this PR. A goal that I was not aware of is to get the size of Task to a minimum, because it gets moved around so much. We could make the buffer size configurable down to a minimum of say 16 bytes, but still we have the vtable that is 32 bytes right now, so UnparkObj takes up quite a bit of stack space and is bloated in comparison to the alternative. I don't know any way to get the vtable out of UnparkObj (the compiler can put it in a global, unfortunately we cannot). So if Task bloat is a problem it's a dead end for this PR.

I had a lot of fun and learned a lot implementing and discussing this, thanks to everyone who gave feedback!

@leoyvens leoyvens closed this Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants