-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update stream rfc #15
Update stream rfc #15
Conversation
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
…treams Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really into these changes! -- I've left some notes on some of the code examples; but overall this is looking great!
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Co-authored-by: Yoshua Wuyts <yoshuawuyts+github@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Rendered, for convenience. |
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments from a quick skim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left a few comments.
[TO BE ADDED] | ||
```rust | ||
pub trait FromStream<A> { | ||
async fn from_stream<T>(stream: T) -> Self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very open-ended question I haven't thought too hard about yet: Would it ever be possible to automatically implement this for all T where T: FromIterator
? It seems like, in most cases, the implementation should be logically equivalent, with the (rather large) wrinkle that now calling next()
suspends you :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think that's possible. To clarify, are you thinking of implementing this same trait for both Streams and Iterators, or are you thinking of having separate traits with very similar functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying that FromStream
(as defined here) and the existing FromIterator
trait are separate, but with very similar functionality.
Thinking more about it, FromStream
is actually more general, because an await point is allowed to suspend execution of the current function but doesn't actually have to. So many, but not all, existing impls of FromIterator
would work for FromStream
as well. If not for coherence with existing impls, we could do impl<T: FromStream<Item = I>> FromIterator<I> for T
, and switch a bunch of FromIterator
impls to be FromStream
impls without having to repeat ourselves.
This is a somewhat similar problem to the LendingStream
/Stream
relationship below, but the relationship is reversed (the trait impls we're starting with are allowed to assume too much – that there are no suspend points – rather than too little).
This might be another possible bullet point for the lang team discussion, @nikomatsakis? (cc rust-lang/lang-team#34)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we solved this in async-std
is by manually copying all implementations of FromIterator
from std (ref). When we choose to introduce std::stream::FromStream
we could pretty much copy-paste all of the impls from async-std
. I forget whether there were coherence issues (likely), but the main reason why we took this approach is because doing general conversions between sync and async code is hard to get right in practice:
- a generic conversion from
sync -> async
means operations may quietly block in the background which can lead to perf issues in runtimes - a generic conversion from
async -> sync
means using methods such astask::block_on
which is hard to tune, or risk deadlocks when nested (both futures-rs and async-std have had issues with this)
To help with the conversion from Iterator
-> Stream
we introduced async_std::stream::from_iter
. To my knowledge all of this together has covered people's needs quite well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the way currently used in async-std
is the ideal way. That way means that all third-party crates that provide the FromIterator
implementation must be patched in order to fully support.
As far as I know, this is why futures
implements collect
using existing traits (Default
, Extend
) rather than its own. The way used in futures
cannot currently optimize based on the size-hint of the stream, but this will be possible once rust-lang/rust#72631 stabilizes.
If we want the collection to extend asynchronously, you can use Sink
or use a loop.
To help with the conversion from Iterator -> Stream we introduced async_std::stream::from_iter. To my knowledge all of this together has covered people's needs quite well.
This seems copy of futures::stream::iter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want the collection to extend asynchronously, you can use Sink or use a loop.
I don't think that's a compelling outcome; this means that patterns from sync Rust wouldn't be transferable to async Rust. Where possible I think it's incredibly valuable to aim for parity between the two. Though this kind of gets into "design philosophy", and at least in the context of this PR we don't need to find consensus.
That way means that all third-party crates that provide the
FromIterator
implementation must be patched in order to fully support.
That's accurate, and I think that's the right outcome. In some cases FromIterator
implementations perform blocking IO which I think is I think enough reason to not introduce a blanket conversion. For example if a function were to use impl FromStream<Result<fs::Metadata>>
as an input, then both an iterator constructed from std::fs::ReadDir
and a stream constructed from async_std::fs::ReadDir
would be valid*. However the synchronous variant would likely interfere with the scheduler.
*Admittedly it's a bit of a contrived example, but hopefully it still serves to illustrate the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and at least in the context of this PR we don't need to find consensus.
Agreed, it actually requires real async trait method to add this to std, and afaik all of the existing approaches have some problems.
In some cases
FromIterator
implementations perform blocking IO which I think is I think enough reason to not introduce a blanket conversion.
Well, this problem also exists in futures::stream::iter
/async_std::stream::from_iter
.
Aside from it's preferable or not, even a crate that is obviously unrelated to async can't be used as a return type for collect
without implementing it, which is necessary information when explaining/considering-adding it (I mainly think about collections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from it's preferable or not, even a crate that is obviously unrelated to async can't be used as a return type for
collect
without implementing it, which is necessary information when explaining/considering-adding it (I mainly think about collections).
One solution I was thinking about for this is, leave futures
's current collect
method (as collect2
or another name), even after the FromStream
based collect
is implemented.
(This is not an ideal solution, but it means third-party crates can provide mitigation for this, even if std
can't solve this problem.)
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
// Convenience methods (covered later on in the RFC): | ||
fn next(&mut self) -> Next<'_, Self> | ||
where | ||
Self: Unpin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can somebody (maybe @withoutboats or @cramertj) explain to me why Self: Unpin
is here and why that's not a crushing limitation? I feel like I'm missing something super basic. I guess the answer is maybe that we expect to be operating with Box<S>
where S: Stream
, and Box<S>
is always Unpin
?
And in terms of why it is needed, I guess it is because: you can invoke next
and that creates a &mut
references to self, but once you've consumed the item, you want to be able to move self
somewhere else .. I don't know. I can see why it requires Unpin
(we are invoking Pin::new
in the next
method on Next
), but I'm having a hard time with the "intuition" here.
I guess the alternative would be the next
takes Pin<&mut Self>
and not &mut self
? Why would that be bad?
Anyway, I think we should have some text justifying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the reason why its needed is pretty clear if you look at the definition of Next:
struct Next<'a, S: Stream> {
stream: &'a mut S
}
Since Stream::poll_next
takes a pinned reference, the next future pretty clearly needs S
to be Unpin
to safely construct a Pin<&mut S>
from &mut S
. The alternative would be for next to take Pin<&mut S>
.
You're basically right about operating on Box<S>
, but remember that we can also operate on a stack-pinned S
a lot of the time (as long as you don't need to return the stream out of this scope), so it doesn't necessarily require an allocation.
I think the current approach is strictly better than requiring the method to be pinned, because that would just require pinning it even when the type is unpin, whereas the current situation requires pinning it only when the type is not unpin.
Right now, no one cares because we have no !Unpin
streams in practice. But once we have async generators, that's going to be a big annoyance with using them. The problem is even worse with generator/Iterator, because generators won't implement Iterator if they have to be pinned to run, instead only pinned references to them will (in essence, the same problem applies to Iterator, except that iterator's definition doesn't even allow for !Unpin iterators). I'm not sure what we're going to do about this when generators exist, this was one of the big papercuts we hadn't looked into yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so the problem will be that if async gen
just returns an impl Stream
(and not, crucially, an impl Stream + Unpin
), you'll have to explicitly do some pinning operation to turn them into a "usable" stream (i.e., one you can call next
one?).
The same seems to apply to a gen fn
, but as you say, the situation is almost worse.
It does make me wonder if Stream
should have a poll_next(&mut self)
method instead of poll_next(Pin<&mut self>)
, and we have the expectation that a gen fn
should require some kind of "explicit operation" to make a Stream
(which would basically be pinning it). The main benefit of this would be that Iterator
kind of requires that. Basically, the argument to me for "let's just stabilize Stream
" has relied in part on the idea that all the problems we face for Stream, we also face for Iterator, but actually in this situation the two designs diverge slightly (with Iterator potentially having "another problem", depending on your POV).
I guess that the idea would be that a gen fn
actually doesn't return an Iterator
but rather an impl IntoIterator
, and an async gen fn
returns an impl IntoStream
, where the "into" part of it does the pinning required to make &mut self
reasonable.
This seems like an important detail, I'm glad we're digging into it a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, no one cares because we have no
!Unpin
streams in practice.
To make sure I'm following, this is because people are hand-writing their streams, and basically here the Pin
-machinery is just kind of boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example when removing the Unpin
bounds and using unsafe: playground
(this is unsound because you can move the underlying stream after Next
dropped, and example is crashed as 'moved between poll calls')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pinning also applies to the design of AsyncRead/AsyncWrite, which currently use Pin even though there's no clear plan to make them implemented with generator type syntax. In essence the asyncification of a singature is currently understood as pinned reciever + context arg + return poll.
Yes. I'm trying to push a bit on this and make sure I understand why we think that is the correct approach, I guess. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So: I don't think that this conversation has to block this PR from landing, and it probably shouldn't even block the RFC itself, but I stll feel like there is some missing text around the next
method that at minimum explains why Unpin
is needed and perhaps tries to cover some of the reasoning above about why poll_next
should take a pinned argument. Maybe the right place for some of this text is in the coverage of generator syntax, since we can discuss some of the difficulties that will have to be resolved in making generator syntax (and async generator syntax) work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to generators that cannot borrow across a yield, I think it's not going to work. I imagine a lot of people are going to want to write code like this
See unless you write gen move
, this is not a self-referential type at all, because it captures myvec
by reference. :) Users writing futures combinators immediately ran into the problem of them being best expressed by self-referential types (and ended up having to Arc<Mutex>
all kinds of stuff to share ownership across yield points), but users writing iterator combinators never have that problem. I think this is worth considering more seriously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ok, that's a good point. There are some differences in the setting, given that iterators tend to carry a lifetime. It may indeed be true that iterator combinators are less likely to require borrows across yields. Interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key to the difference is that futures are ultimately passed to some executor API like spawn
, which expects a static future; to achieve that the futures contain all the state they need, and references are internal to that state. Iterators are almost never required to be 'static
by the APIs that consume them.
However, it's still the case that it will be far more natural to end up in self-referential situations with a generator than it is with a manually implemented iterator, because you will be able to implicitly build that state using the member variables of a function. So the restriction might still limit users from creating code in a natural way. And it's a bit confusing for the restriction to exist only in non-async generators.
So we'll have to get more experience to figure it out.
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Co-authored-by: Jake Goulding <shepmaster@mac.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
Co-authored-by: Dominik Stolz <voidc@users.noreply.github.com>
Signed-off-by: Nell Shamrell <nellshamrell@gmail.com>
I'm going to go ahead and merge these edits, since I think they've been at a stable point for some time. |
No description provided.