-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposing new AsyncRead
/ AsyncWrite
traits
#1744
Conversation
This replaces the `[u8]` byte slice arguments to `AsyncRead` and `AsyncWrite` with `dyn Buf` and `dyn BufMut` trait objects.
let res = ready!(self.as_mut().get_pin_mut().poll_read(cx, buf)); | ||
self.discard_buffer(); | ||
return Poll::Ready(res); | ||
} | ||
unimplemented!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the intention to finish this before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, ahem, yes. Nothing to see here.
(I commented some things out when I started the branch to chip away at the insane number of compiler errors, and then may have forgotten where I left some of these...)
@seanmonstar from the PR description:
It looks like the |
Initial thoughts from a quick skim:
e: Oh, and I really like this idea overall. |
Good point, I'd thought about that as well. I'm in favor of changing the return types of
I haven't been bothered by the return type myself, so it wasn't something that I personally wanted to experiment with, and I was scared away by the supposed dragons mentioned in the origin std-io reform RFC. But I don't mean to scare away others from experimenting as well :) |
Fair enough, I can draft a follow-up once this is in. |
WRT unininitialized buffers, I have a half-written thing discussing the path forward for std that I'm going to try to finish up this week: https://paper.dropbox.com/doc/IO-Buffer-Initialization--AoGf~cBSiAGi3mjGAJe9fW9pAQ-MvytTgjIOTNpJAS6Mvw38 - please add comments if you'd like! It does seem to me like this is a nice and clean approach, but there's no way we can adjust |
AsyncRead
/ AsyncWrite
AsyncRead
/ AsyncWrite
traits
The downsides of generic methods seems to be mainly additional complexity required to use a trait object (via secondary provided object-safe methods, or a secondary trait with a blanket impl). In exchange, they offer guaranteed static dispatch. I'm not sure either side is a clear win, particularly since virtual dispatch likely pales in comparison to the actual cost of doing I/O. One case where the inlining afforded by monomorphization might be important is implementations of |
I can re-check the description later, but it might not be clear: this proposes passing |
Yeah, just giving some initial thoughts on that tradeoff in response to @carllerche's request.
That could still be accomplished with an extra trait. |
Yeah, I think the original text
Is what generated the confusion here, along with the various mentions of |
I think there is something to this. I don't think virtual dispatches matter when reading from the real TCP socket. But often buffered readers are implemented on top of these - which are then used by deserialization libraries. When those read things in 1-8 byte chunks from the buffer the dispatch might have an impact. But that should be benchmarkable. I guess there are libraries out there which are exactly doing this (h2?). Is the ergonomics reason such a big thing in practice? I'm typically just doing
Is nesting and For application code I think interacting with those things via Future adapters will likely get more the norm. Here people do not interact with the low-level AsyncRead/AsyncWrite methods anyway. E.g. we write things like: fn do_with_stream(stream: &dyn AsyncWrite) {
let rem = &my_data[..];
stream.write_all(rem).await
} |
Same as In my personal experience, I definitely hit the issue where, in order to have the most efficient impl possible, I needed to implement both For me, the biggest factor is the "safety" aspect and uninitialized memory. Being required to zero out memory is sub optimal and I don't think the current strategy for modeling safety & uninitialized memory is ideal. WDYT? |
Also, to note, taking Object safety can also be achieved by doing something like (i forgot the exact incantation): trait AsyncRead {
fn poll_read(&mut self, dst: impl BufMut) -> Poll<io::Result<usize>> where Self: Sized;
#[doc(hidden)]
fn poll_read_dyn(&mut self, dst: &mut dyn BufMut) -> Poll<io::Result<usize>> {
self.poll_read(dst)
}
}
impl AsyncRead for Box<dyn AsyncRead> {
fn poll_read(&mut self, dst: impl BufMut) -> Poll<io::Result<usize>> {
self.poll_read_dyn(dst)
}
} |
Right. But I would think a lot of those just go back into either being an buffered reader around another stream, or they have an internal buffer which actually doesn't benefit from vectored IO. Maybe @Nemo157 has some insight from the implementation of async-compression?
I'm guilty too :-) Although the only situation I really could recall where I could forward some vectored IO was in the case of some buffered reader/writer operation - when the buffer was exhausted and bypassed.
I definitely agree that zeroing out memory is a bit annoying from a performance perspective, since it is typically not necessary for the correctness of the program. I don't yet understand what the impact of simply not doing it would be, since I don't have the historic context of APIs like
I think being able to pass Object safety was one of the main reason why directly implementing IO as |
Being able to easily see how many bytes were written/read during the current call is very useful, at a minimum to be able to check for EOF, but also when performing other operations on the data in parallel to writing/reading it. EDIT: It is possible to get this data without returning it, but it becomes a very painful pattern after the fifth time of writing it let prior_len = buf.bytes().len();
writer.poll_write(&mut buf)?;
let written = buf.bytes().len() - prior_len; and looking at |
Nope, I have been ignoring the vectored IO methods so far. One complication is that I'm using If a user is passing vectored IO buffers into the encoder/decoder then there would be a performance advantage to having it loop over all the buffers within the context of a single operation, to avoid extra calls into the underlying IO per-buffer. This seems easy to implement, and I could do the inverse of the default implementation and just have |
Are there any plans on having generic support for various underlying types? I.e. To be concrete, I have a practical use case example. I\m using the code that extends the |
The "read to uninitialized memory" is a really thorny problem, and I'm glad to see it's being addressed! However, some important aspects of it are not clear to me from the description:
Returning a single slice does not seem to be possible with the current release of Also, |
@sfackler I've been giving a lot of thought to encapsulating uninit buffers, incl. in Read trait. I'll leave comments on the doc in the next few days. You can also catch me in |
That's the behavior of Definitely interested in your thoughts on Read + uninit buffers! |
This is an interesting thought; |
Just wondering, given the amount of times Otherwise, the arguments sounds reasonable. |
@baloo I believe a |
@hawkw virtual dispatch has runtime overhead and it might not be desirable for high performance. Using Buf* is certainly convenient as user can provide something more than concrete type, but we should note potential performance hit |
My comment is divided into two sections: first, a relatively small proposal for how the organization in this API should change. Second, some commentary on the divergence between this API and the futures crate, and my opinion about the future of IO interfaces in std. First, I think it would be advisable to move the It is much easier for users to navigate breakages of libraries that define types than libraries that define traits, because traits are a contract between multiple users (for example, between tokio and its users), and everyone must agree on the definition of the contract. In contrast, users can have multiple definitions of the Bytes types in their application without significant difficulties. Second, I think that the standard library is pretty likely to adopt the AsyncRead/AsyncWrite traits from futures essentially as is, and I don't think its likely that it will adopt these changes. I want to explain why I have that view. Before I do, I should say I think its fine for tokio to diverge: the bridge from std types to tokio will be cheap, because its just a virtualization of the argument, whereas the bridge from tokio to std will probably require a deep copy. But I expect the bridge to mainly work in the std -> tokio direction (i.e. types implementing std's traits used with interfaces expecting tokio's traits), so hopefully that's not such a big deal. My point is that I'm not trying to convince you to adopt futures' definitions in tokio, just explain why I think those will be the more likely choice for std. Vectored operationsIn my opinion, the solution exposed by futures is adequate: have an additional There's a funnel effect to encountering this bug: the user has to both want to perform a vectored operation and be using middleware which is implemented incorrectly. This makes the problem, in my opinion, quite niche, and the best solution is to send PRs to bad libraries that aren't supporting vectored operations. In extremis, we could do exactly what we did with flush already: make the vectored ops required to implement, to encourage users to forward them correctly. I'm not convinced even this is necessary, but I do think it would be sufficient. In contrast, I'm not enthusiastic with the solution which bypasses the vectored methods. For example, to get 1024 vectored buffers for your rope, the leaf AsyncWrite would have to allocate space on the stack for 1024 iovecs - several pages of space. It would have to do this unconditionally, meaning a huge stack frame (multiple pages) for every write. Or it would have to choose a smaller number, and then that rope is again not as optimal as it could be. Though I don't know the whole backstory here, it sounds to me like someone hit a first mover problem: no one had been doing vectored IO in Rust, and so no one had written code to handle it properly. This resulted in frustrating performance problems. But that's a sign that our ecosystem is still maturing; once we have a robust ecosystem that has been battle tested in production by many users, these problems will be increasingly fleeting. I see no reason this problem can't be solved over time with PRs and issues in key crates. Uninitialized MemoryThis one is heavier. I have two opinions here; I expect they're a bit controversial, each with different groups of people. And maybe I'm wrong. My first claim is that the cost of zeroing memory is insignificant for any well designed long running network application, which should have a buffer pool that amortizes the cost of initializing memory into insignificance over the lifetime of the application. I can see this being untrue if you have extremely spiky loads with sudden brief periods of intense activity, and you don't want to maintain resident memory enough to handle those spikes at all times, but I don't think that's normal. My second claim is that we are eventually going to provide an API for LLVM freeze. This has a certain air of inevitability to me, despite the risks of using freeze. Yes, its true that if you freeze an uninitialized buffer it could leak secrets, but Read implementations that don't just write into the buffer are pathologically wrong and not being used in practice. Therefore, simply freezing an uninitialized buffer before passing it to an AsyncRead interface is sufficient, for people who can't afford to zero their memory, to solve the problem. No change to the trait interface needed. Downsides of this APII want to highlight downsides of the API that are reasons I would be uncomfortable implementing this API in std. These are not blockers, and with enough upside I could be convinced to override them, but as I've elaborated I think the arguments in favor of this API over the one in futures are weak.
ConclusionPulling all of that together, I'm more optimistic about the API in futures than one which uses a different buffer type argument. In general, I'd say I'm very bullish that the API we have in futures now is the one we will add to std. (I'm also not enthusiastic about other changes people are considering from the futures trait, like changes to make it possible to implement read/write with an async fn, or changes to share buffers with completion based IO, but I've left those out of this post since they aren't the proposal here.) |
Thanks @withoutboats for the well-thought post!
We've considered this, and it very well could happen! If worth considering more, we should open a separate issue to discuss the trade-offs.
I can offer my own experience: even though I have a very deep understanding of how the vectored APIs work, I have been bitten by this problem, and thus I'm motivated to fix the API, since it currently is too easy for a veteran to mess up, thus I must assume others would also have this problem. I've written a lot of middle layers that implement
We could add
My experience leads to me to believe differently.
It seems in many cases, using something like
I get that, it makes plenty of sense! I lean more towards picking the best design for the most common cases, even if I think my personal preference is to make sure there is a design that allows addressing these issues, even if |
@withoutboats Thanks for the thoughtful response. I will take more time to digest it (lot going on today). I do want to add a quick note for your consideration re: vectored operations. While "forgetting" to implement the vectored functions are a problem, I believe the bigger problem is that in cases where the implementation cannot implement the function (no vectored capabilities), simply forwarding the call to the non vectored function is not sufficient. I go in more detail in this comment. |
As far as forgetting to forward methods goes, I still think a warn-by-default lint would be a fine solution. I would prefer it to go directly into rustc though, otherwise you just move the problem to remembering to set up Clippy. The other reason I think it belongs in rustc is so it can be applied to, e.g. I'd gladly take charge on writing and implementing an RFC if we decide that's a good way to go. |
I want to vehemently disagree with this approach for a few reasons:
That is to say, while accessing the contents of a reused buffer may be safe in a Rust-the-language sense (i.e. does not exhibit or permit any memory unsafety violations), it is not safe in a general systems sense. After all, famously this is the cause of both Heartbleed and Cloudbleed Note also in particular Heartbleed went undetected for so long because OpenSSL implemented its own buffer pool. I suspect that the full degree of Heartbleed (i.e. leaking private keys) is near impossible in Rust, as it seems unlikely that anyone would be loading those via such a buffer pool, but it still offers lots of ability to leak other private data. I feel APIs should be designed to be misuse resistant. Ideally, the route of least resistance is not just memory-safe, but also makes it as difficult as possible to accidentally cause other kinds of security vulnerabilities. It worries me that with the |
There is no need to provide care for cases when user intentionally passes unitialized memory though. |
Note that, as far as memory-backed buffers go, this is not just a matter of providing an API to something LLVM already has. The The issue with leaking secrets is less the I am a bit worried about implicitly deciding to adopt freeze as a side-effect of API design, as opposed to consciously making that decision. |
I probably won't be able to engage much more over the next few weeks, but I want to emphasize that I think its fine for tokio to have traits that implement a different interface which you think better suits your users. My arguments mainly have to do with std - and there, I think the symmetry with the So I'd mainly encourage you with that in mind to think about the costs of the conversion between the two APIs and plan for the possibility that users will have to do that conversion, hopefully as cheaply as possible. |
I would personally expect the opposite. General middleware libraries (e.g. compression, parsers, etc.) would be defined generically over std's traits, then used in applications that pass them Tokio's implementations. But that also meshes well with your expectation that this direction is the cheap conversion, so maybe I'm somehow reading your parenthetical expansion backwards? |
In the This isn't actually about uninitialised memory as such, but rather about arbitrarily initialised memory which potentially contains secrets which need to be overwritten. |
More like I wrote it backwards! This gets confusing 😅 |
The proposed API necessitates a redundant copy of all IO bytes because it cannot guarantee that the user-provided buffers are page-aligned and locked to physical memory, as is required for DMA transfers. For maximum performance, either the IO layer should be responsible for allocating and managing buffers according to its requirements for DMA or these requirements should be communicated to the user so that it can be properly allocated, and perhaps initialized by the IO layer. If this proposal aims to maximize possible performance, something like this would be more appropriate: The IOBuffers in this case would be a number of buffers (probably a linked list) allocated by the IO layer's internal buffer pool, and returned to that pool on WinSock Registered I/O provides an API for "pre-registering" buffers and thus reducing overhead. |
I've also been thinking about this. I'm afraid the general reaction will be that this completely inverses the very much adopted pattern in rust that producers always take a reference to a buffer. It's a consumer driven pattern where the consumer can chose to reuse that buffer once done with the data. I have been thinking about this when intermediate layers change the type of the data. A websocket message in tungstenite is an enum with a variant (for binary data): The current design of |
There is a trait available for IO-layer allocated buffers: Has there been any investigation into whether these changes should also result in changes to the |
Is there the same need for heterogeneous buffer types in I/O-layer-allocation schemes? |
Why can’t we just split the vectored API into another set of traits? Like |
Given rust aims to be a system programming language, it’s always hard for see API like the current |
Any updates on this? And where do we currently stand on using See actix/examples#228 (comment) and https://github.com/actix/examples/pull/228/files#diff-a8ac72c878c236b5b5236b768a9e9c1aR17 |
Thanks for all the discussion. Things have changed since this PR so I am going to close it and work on opening a new proposal factoring in this discussion. |
Introduce new
AsyncRead
/AsyncWrite
This PR introduces new versions of
AsyncRead
/AsyncWrite
traits.The proposed changes aim to improve:
Overview
The PR changes the
AsyncRead
andAsyncWrite
traits to acceptT: Buf
andT: BufMut
values instead of&[u8]
and&mut [u8]
. Because&[u8]
implementsBuf
and&mut [u8]
implementsBufMut
, the samecalling patterns used today are still possible. Additionally, any type
that implements
Buf
andBufMut
may be used. This includesCursor<&[u8]>
,Bytes
, ...
Improvement in ergonomics
Calls to
read
andwrite
accept buffers, but do not necessary use upthe entirety of the buffer. Both functions return a
usize
representingthe number of bytes read / written. Because of this, it is common to
write loops such as:
The key point to notice is having to use the return value to update the
position in the cursor. This is both common and error prone. The
Buf
/BufMut
traits aim to ease this by building the cursor concept directlyinto the buffer. By using these traits with
AsyncRead
/AsyncWrite
,the above loop can be simplified as:
A small reduction in code, but it removes an error prone bit of logic
that must be often repeated.
Integration of vectored operations
In the
AsyncRead
/AsyncWrite
traits provided byfutures-io
,vectored operations are covered using separate fns:
poll_read_vectored
and
poll_write_vectored
. These two functions have defaultimplementations that call the non-vectored operations.
This has a draw back, when implementing
AsyncRead
/AsyncWrite
,usually as a layer on top of a type such as
TcpStream
, the implementormust not forget to impleement these two additional functions. Otherwise,
the implementation will not be able to use vectored operations even if
the underlying TcpStream supports it. Secondly, it requires duplication
of logic: one
poll_read
implementation and onepoll_read_vectored
implementation. It is possible to implement one in terms of the other,
but this can result in sub-optimial implementations.
Imagine a situation where a rope
data structure is being written to a socket. This structure is comprised
of many smaller byte slices (perhaps thousands). To write it efficiently
to the socket, avoiding copying data is preferable. To do this, the byte
slices need to be loaded in an
IoSlice
. Since modern linux systemssupport a max of 1024 slices, we initialize an array of 1024 slices,
iterate the rope to populate this array and call
poll_write_vectored
.The problem is that, as the caller, we don't know if the
AsyncWrite
type supports vectored operations or not,
poll_write_vectored
iscalled optimistically. However, the implementation "forgot" to proxy its
function to
TcpStream
, sopoll_read
is called w/ the first entry inthe
IoSlice
. The problem is, for each call topoll_read_vectored
, wemust iterate 1024 nodes in our rope to only have one chunk written at a
time.
By using
T: Buf
as the argument, the decision of whether or not to usevectored operations is left up to the leaf
AsyncWrite
type.Intermediate layers only implement
poll_write
w/T: Buf
and pass italong to the inner stream. The
TcpStream
implementation will know thatit supports vectored operations and know how many slices it can write at
a time and do "the right thing".
Working with uninitialized byte slices
When passing buffers to
AsyncRead
, it is desirable to pass inuninitialized memory which the
poll_read
call will write to. Thisavoids the expensive step of zeroing out the memory (doing so has
measurable impact at the macro level). The problem is that uninitialized
memory is "unsafe", as such, care must be taken.
Tokio initially attempted to handle this by adding a
prepare_uninitialized_buffer
function.
std
is investigating adding asimilar
though improved variant of this API. However, over the years, we have
learned that the
prepare_uninitialized_buffer
API is sub optimal formultiple reasons.
First, the same problem applies as vectored operations. If an
implementation "forgets" to implement
prepare_uninitialized_buffer
then all slices must be zeroed out before passing it to
poll_read
,even if the implementation does "the right thing" (not read from
initialized memory). In practice, most implementors end up forgetting to
implement this function, resulting in memory being zeroed out.
Secondly, implementations of
AsyncRead
that should not requireunsafe
to implement now must addunsafe
simply to avoid havingmemory zeroed out.
Switching the argument to
T: BufMut
solves this problem via theBufMut
trait. First,BufMut
provides low-level functions that return&mut [MaybeUninitialized<u8>]
. Second, it provides utility functionsthat provide safe APIs for writing to the buffer (
put_slice
,put_u8
,...). Again, only the leaf
AsyncRead
implementations (TcpStream
)must use the unsafe APIs. All layers may take advantage of uninitialized
memory without the associated unsafety.
Drawbacks
The primary drawback is genericizing the
AsyncRead
andAsyncWrite
traits. This adds complexity. We feel that the added benefits discussed
above outweighs the drawbacks, but only trying it out will validate it.
Relation to
futures-io
,std
, and roadmap
The relationship between
tokio
's I/O traits andfutures-io
has comeup a few times in the past. Tokio has historically maintained its own
traits.
futures-io
has just been released with a simplified version ofthe traits. There is also talk of standardizing
AsyncRead
andAsyncWrite
instd
. Because of this, we believe that now is theperfect time to experiment with these traits. This will allow us to gain
more experience before committing to traits in
std
.
The next version of Tokio will not be 1.0. This allows us to
experiment with these traits and remove them for 1.0 if they do not pan
out.
Once
AsyncRead
/AsyncWrite
are added tostd
, Tokio will provideimplementations for its types. Until then,
tokio-util
will provide acompatibility layer between Tokio types and
futures-io
.