Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stream rfc #15

Merged
merged 26 commits into from
Jul 29, 2020
Merged

Conversation

nellshamrell
Copy link
Contributor

No description provided.

Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
…treams

Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Copy link
Member

@yoshuawuyts yoshuawuyts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really into these changes! -- I've left some notes on some of the code examples; but overall this is looking great!

rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
nellshamrell and others added 7 commits July 7, 2020 10:53
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
@nellshamrell nellshamrell marked this pull request as ready for review July 7, 2020 19:47
@skade
Copy link

skade commented Jul 7, 2020

Rendered, for convenience.

Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments from a quick skim

rfc-drafts/stream.md Outdated Show resolved Hide resolved
Copy link
Member

@tmandry tmandry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few comments.

rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
[TO BE ADDED]
```rust
pub trait FromStream<A> {
async fn from_stream<T>(stream: T) -> Self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very open-ended question I haven't thought too hard about yet: Would it ever be possible to automatically implement this for all T where T: FromIterator? It seems like, in most cases, the implementation should be logically equivalent, with the (rather large) wrinkle that now calling next() suspends you :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think that's possible. To clarify, are you thinking of implementing this same trait for both Streams and Iterators, or are you thinking of having separate traits with very similar functionality?

Copy link
Member

@tmandry tmandry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that FromStream (as defined here) and the existing FromIterator trait are separate, but with very similar functionality.

Thinking more about it, FromStream is actually more general, because an await point is allowed to suspend execution of the current function but doesn't actually have to. So many, but not all, existing impls of FromIterator would work for FromStream as well. If not for coherence with existing impls, we could do impl<T: FromStream<Item = I>> FromIterator<I> for T, and switch a bunch of FromIterator impls to be FromStream impls without having to repeat ourselves.

This is a somewhat similar problem to the LendingStream/Stream relationship below, but the relationship is reversed (the trait impls we're starting with are allowed to assume too much – that there are no suspend points – rather than too little).

This might be another possible bullet point for the lang team discussion, @nikomatsakis? (cc rust-lang/lang-team#34)

Copy link
Member

@yoshuawuyts yoshuawuyts Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we solved this in async-std is by manually copying all implementations of FromIterator from std (ref). When we choose to introduce std::stream::FromStream we could pretty much copy-paste all of the impls from async-std. I forget whether there were coherence issues (likely), but the main reason why we took this approach is because doing general conversions between sync and async code is hard to get right in practice:

  • a generic conversion from sync -> async means operations may quietly block in the background which can lead to perf issues in runtimes
  • a generic conversion from async -> sync means using methods such as task::block_on which is hard to tune, or risk deadlocks when nested (both futures-rs and async-std have had issues with this)

To help with the conversion from Iterator -> Stream we introduced async_std::stream::from_iter. To my knowledge all of this together has covered people's needs quite well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the way currently used in async-std is the ideal way. That way means that all third-party crates that provide the FromIterator implementation must be patched in order to fully support.

As far as I know, this is why futures implements collect using existing traits (Default, Extend) rather than its own. The way used in futures cannot currently optimize based on the size-hint of the stream, but this will be possible once rust-lang/rust#72631 stabilizes.

If we want the collection to extend asynchronously, you can use Sink or use a loop.

To help with the conversion from Iterator -> Stream we introduced async_std::stream::from_iter. To my knowledge all of this together has covered people's needs quite well.

This seems copy of futures::stream::iter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the collection to extend asynchronously, you can use Sink or use a loop.

I don't think that's a compelling outcome; this means that patterns from sync Rust wouldn't be transferable to async Rust. Where possible I think it's incredibly valuable to aim for parity between the two. Though this kind of gets into "design philosophy", and at least in the context of this PR we don't need to find consensus.

That way means that all third-party crates that provide the FromIterator implementation must be patched in order to fully support.

That's accurate, and I think that's the right outcome. In some cases FromIterator implementations perform blocking IO which I think is I think enough reason to not introduce a blanket conversion. For example if a function were to use impl FromStream<Result<fs::Metadata>> as an input, then both an iterator constructed from std::fs::ReadDir and a stream constructed from async_std::fs::ReadDir would be valid*. However the synchronous variant would likely interfere with the scheduler.

*Admittedly it's a bit of a contrived example, but hopefully it still serves to illustrate the point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and at least in the context of this PR we don't need to find consensus.

Agreed, it actually requires real async trait method to add this to std, and afaik all of the existing approaches have some problems.

In some cases FromIterator implementations perform blocking IO which I think is I think enough reason to not introduce a blanket conversion.

Well, this problem also exists in futures::stream::iter/async_std::stream::from_iter.


Aside from it's preferable or not, even a crate that is obviously unrelated to async can't be used as a return type for collect without implementing it, which is necessary information when explaining/considering-adding it (I mainly think about collections).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from it's preferable or not, even a crate that is obviously unrelated to async can't be used as a return type for collect without implementing it, which is necessary information when explaining/considering-adding it (I mainly think about collections).

One solution I was thinking about for this is, leave futures's current collect method (as collect2 or another name), even after the FromStream based collect is implemented.
(This is not an ideal solution, but it means third-party crates can provide mitigation for this, even if std can't solve this problem.)

rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
nellshamrell and others added 7 commits July 9, 2020 13:14
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
@nikomatsakis
Copy link
Contributor

Reading the rendered form

image

I think those bullets were meant to be expanded upon -- and in fact I think we have, so they can just be removed.

(I can't figure out how to leave this as a comment on the diff, since they're not part of the diff...)

// Convenience methods (covered later on in the RFC):
fn next(&mut self) -> Next<'_, Self>
where
Self: Unpin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can somebody (maybe @withoutboats or @cramertj) explain to me why Self: Unpin is here and why that's not a crushing limitation? I feel like I'm missing something super basic. I guess the answer is maybe that we expect to be operating with Box<S> where S: Stream, and Box<S> is always Unpin?

And in terms of why it is needed, I guess it is because: you can invoke next and that creates a &mut references to self, but once you've consumed the item, you want to be able to move self somewhere else .. I don't know. I can see why it requires Unpin (we are invoking Pin::new in the next method on Next), but I'm having a hard time with the "intuition" here.

I guess the alternative would be the next takes Pin<&mut Self> and not &mut self? Why would that be bad?

Anyway, I think we should have some text justifying it.

Copy link

@withoutboats withoutboats Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the reason why its needed is pretty clear if you look at the definition of Next:

struct Next<'a, S: Stream> {
    stream: &'a mut S
}

Since Stream::poll_next takes a pinned reference, the next future pretty clearly needs S to be Unpin to safely construct a Pin<&mut S> from &mut S. The alternative would be for next to take Pin<&mut S>.

You're basically right about operating on Box<S>, but remember that we can also operate on a stack-pinned S a lot of the time (as long as you don't need to return the stream out of this scope), so it doesn't necessarily require an allocation.

I think the current approach is strictly better than requiring the method to be pinned, because that would just require pinning it even when the type is unpin, whereas the current situation requires pinning it only when the type is not unpin.

Right now, no one cares because we have no !Unpin streams in practice. But once we have async generators, that's going to be a big annoyance with using them. The problem is even worse with generator/Iterator, because generators won't implement Iterator if they have to be pinned to run, instead only pinned references to them will (in essence, the same problem applies to Iterator, except that iterator's definition doesn't even allow for !Unpin iterators). I'm not sure what we're going to do about this when generators exist, this was one of the big papercuts we hadn't looked into yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the problem will be that if async gen just returns an impl Stream (and not, crucially, an impl Stream + Unpin), you'll have to explicitly do some pinning operation to turn them into a "usable" stream (i.e., one you can call next one?).

The same seems to apply to a gen fn, but as you say, the situation is almost worse.

It does make me wonder if Stream should have a poll_next(&mut self) method instead of poll_next(Pin<&mut self>), and we have the expectation that a gen fn should require some kind of "explicit operation" to make a Stream (which would basically be pinning it). The main benefit of this would be that Iterator kind of requires that. Basically, the argument to me for "let's just stabilize Stream" has relied in part on the idea that all the problems we face for Stream, we also face for Iterator, but actually in this situation the two designs diverge slightly (with Iterator potentially having "another problem", depending on your POV).

I guess that the idea would be that a gen fn actually doesn't return an Iterator but rather an impl IntoIterator, and an async gen fn returns an impl IntoStream, where the "into" part of it does the pinning required to make &mut self reasonable.

This seems like an important detail, I'm glad we're digging into it a bit more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, no one cares because we have no !Unpin streams in practice.

To make sure I'm following, this is because people are hand-writing their streams, and basically here the Pin-machinery is just kind of boilerplate.

Copy link
Member

@taiki-e taiki-e Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example when removing the Unpin bounds and using unsafe: playground
(this is unsound because you can move the underlying stream after Next dropped, and example is crashed as 'moved between poll calls')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pinning also applies to the design of AsyncRead/AsyncWrite, which currently use Pin even though there's no clear plan to make them implemented with generator type syntax. In essence the asyncification of a singature is currently understood as pinned reciever + context arg + return poll.

Yes. I'm trying to push a bit on this and make sure I understand why we think that is the correct approach, I guess. =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So: I don't think that this conversation has to block this PR from landing, and it probably shouldn't even block the RFC itself, but I stll feel like there is some missing text around the next method that at minimum explains why Unpin is needed and perhaps tries to cover some of the reasoning above about why poll_next should take a pinned argument. Maybe the right place for some of this text is in the coverage of generator syntax, since we can discuss some of the difficulties that will have to be resolved in making generator syntax (and async generator syntax) work.

Copy link

@withoutboats withoutboats Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to generators that cannot borrow across a yield, I think it's not going to work. I imagine a lot of people are going to want to write code like this

See unless you write gen move, this is not a self-referential type at all, because it captures myvec by reference. :) Users writing futures combinators immediately ran into the problem of them being best expressed by self-referential types (and ended up having to Arc<Mutex> all kinds of stuff to share ownership across yield points), but users writing iterator combinators never have that problem. I think this is worth considering more seriously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, that's a good point. There are some differences in the setting, given that iterators tend to carry a lifetime. It may indeed be true that iterator combinators are less likely to require borrows across yields. Interesting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key to the difference is that futures are ultimately passed to some executor API like spawn, which expects a static future; to achieve that the futures contain all the state they need, and references are internal to that state. Iterators are almost never required to be 'static by the APIs that consume them.

However, it's still the case that it will be far more natural to end up in self-referential situations with a generator than it is with a manually implemented iterator, because you will be able to implicitly build that state using the member variables of a function. So the restriction might still limit users from creating code in a natural way. And it's a bit confusing for the restriction to exist only in non-async generators.

So we'll have to get more experience to figure it out.

Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
rfc-drafts/stream.md Outdated Show resolved Hide resolved
rfc-drafts/stream.md Outdated Show resolved Hide resolved
nellshamrell and others added 2 commits July 10, 2020 10:56
Co-authored-by: Jake Goulding <shepmaster@mac.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
rfc-drafts/stream.md Outdated Show resolved Hide resolved
nellshamrell and others added 2 commits July 14, 2020 11:21
Co-authored-by: Dominik Stolz <voidc@users.noreply.github.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
@nellshamrell nellshamrell mentioned this pull request Jul 17, 2020
@nikomatsakis
Copy link
Contributor

I'm going to go ahead and merge these edits, since I think they've been at a stable point for some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants