I/O splicing #160

dead-claudia · 2023-01-09T10:25:22Z

Proposal

Problem statement

std::io::copy provides a way to do a complete copy, but blocks indefinitely until it's completed. There's no way to portably do partial copies within the standard library, and it's not extensible to other types.

Motivation, use-cases

std::io::copy, is very useful for quick and dirty copying from a readable to a writable. However, it does have some serious limitations:

You can intercept cases where writes would fail, but signal interrupts are ignored. This is specifically bad when using file descriptor polling outside async runtimes, because it breaks signal handling.
The current source code is hard-coded to optimize only for native file handles and (as of RFC 2930) BufRead. It's not extensible to other types, and more importantly, userland types can't optimize for each other, either.
As it loops indefinitely until completed, it cannot be used in async runtimes.

There's also a number of outstanding performance gaps:

Copying from a Vec<u8> or VecDeque<u8> to a file or socket currently involves an intermediate copy, one that's very easily avoidable.
VecDeque<u8> itself only tries to copy one half at a time, when in many cases it could just use a single vectored write. This would make it a lot more viable as a general byte buffer when paired with a &[u8] reader or &mut [u8] writer.

The RFC draft linked below goes into detail about potential use cases, including some sample code. It's admittedly a bit complex of a proposal.

Solution sketches

Option 1: a Splicer trait. This is the primary one due to concerns around whether adding the needed "type" field to std::fs::File (and possibly std::net::TcpStream) is possible, or if too many packages break due to someone transmuting between those and raw file descriptors somewhere in the dependency chain.

// Exported from `std::io`
pub trait Splicer<'a, R, W> {
    fn new(reader: &'a mut R, writer: &'a mut W) -> io::Result<Self>;
    fn splice(&mut self) -> io::Result<usize>;
    fn reader(&mut self) -> &'a mut R;
    fn writer(&mut self) -> &'a mut W;
}

pub trait Read {
    // ...
    type Splicer<'a, W: io::Write>: Splicer<'a, Self, W> = DefaultSplicer<'a, Self, W>;
    fn splicer_to<'a, W: io::Write>(
        &'a mut self,
        dest: &'a mut W
    ) -> io::Result<Self::Splicer<'a, W>> {
        Self::Splicer::new(self, dest)
    }
    fn can_efficiently_splice_to(&self) -> bool;
}

pub trait Write {
    // ...
    type Splicer<'a, R: io::Read>: Splicer<'a, R, Self> = R::Splicer<'a>;
    fn splicer_from<'a, R: io::Read>(
        &'a mut self,
        src: &'a mut R
    ) -> io::Result<Self::Splicer<'a, R>> {
        Self::Splicer::new(self, dest)
    }
    fn can_efficiently_splice_from(&self) -> bool;
}

pub struct DefaultSplicer<'a, R, W> { ... }
impl<'a, R: io::Read, W: io::Write> Splicer<'a, R, W> for DefaultSplicer<'a, R, W> { ... }

pub struct OptimalSplicer<'a, R, W> { ... }
impl<'a, R: io::Read, W: io::Write> Splicer<'a, R, W> for OptimalSplicer<'a, R, W> { ... }

pub fn splicer<'a, R: io::Read, W: io::Write>(reader: &'a mut R, writer: &'a mut W) -> OptimalSplicer<'a, R, W>;

Option 2: splice_to/splice_from methods, preferred if adding those fields to std::fs::File/possibly std::net::TcpStream won't create compatibility concerns.

// Exported from `std::io`
trait Read {
    // ...
    fn splice_to<W: io::Write>(&mut self, dest: &mut W) -> io::Result<usize>;
    fn can_efficiently_splice_to(&self) -> bool;
}

trait Write {
    // ...
    fn splice_from<R: io::Read>(&mut self, src: &mut R) -> io::Result<usize> {
        src.splice_to(self)
    }
    fn can_efficiently_splice_from(&self) -> bool;
}

pub fn splice<R: io::Read, W: io::Write>(reader: &mut R, writer: &mut W) -> io::Result<usize>;

And in either case, std::io::copy just splicing instead of a complicated copy in its loop, with the read → write copy fallback being moved to the default splicer.

Links and related work

Drafted an RFC here that goes into much gorier detail: https://github.com/dead-claudia/rust-rfcs/blob/io-splice/text/0000-io-splice.md

Decided to file it here first rather than go straight into the RFC process. It's complex enough it'll probably go through an RFC anyways just because of all the cross-cutting concerns.

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

The text was updated successfully, but these errors were encountered:

the8472 · 2023-01-09T11:14:35Z

The proposed DefaultSplice impl can't work for non-blocking file descriptors because it would drop data on the floor every time a WouldBlock happens.

I also don't understand why those methods are default fn, we're currently not exposing specialization as part of std APIs.

dead-claudia · 2023-01-09T11:44:21Z

@the8472 Thanks for pointing those out. Those were just oversights and mistakes on my part. Some of it was leftovers from other stuff like where I was exploring specialization. Sorry for the confusion.

I yanked the implementation of both built-in splicers out and clarified what they should do at a higher level - probably should've done that from the beginning.

I also went through and removed a bunch of other crud left over from previous iterations in the RFC draft itself.

Should all be fixed now.

the8472 · 2023-01-09T12:33:49Z

Can you sketch how it would construct a splicer that turns R = Take<BufReader<File>>, W = BufWriter<TcpSocket> into a sendfile + fallbacks (as the current specializations do)? I'm not quite seeing how it would avoid a combinatoric explosion or unnecessary work.

As a note, several of the motivating issues could be addressed in io::copy by a few additional specializations. Supporting non-blocking file descriptors could be done after specialization is stabilized by adding an API guarantee that no data gets dropped on WouldBlock if you wrap the writer side in a BufWriter.

dead-claudia · 2023-01-13T12:59:09Z

@the8472 I could see it done this way, after adding an extra read.get_read_fd_handle()/write.get_write_fd_handle() method (returning Option<&dyn Read + AsFd> and Option<&dyn Write + AsFd> respectively*) to check for possible underlying file descriptors:

Take starts off by reading up to N bytes, just to consume it, then just delegates to its inner reader's or writer's splicer according to std::io::splicer.
BufWriter would return its inner write.get_write_fd_handle() as its own provided it doesn't have any buffered data, and it'd just delegate to its reader's splicer (as is the default).
BufReader would check for its inner read.get_read_fd_handle() as well as the splice target's write.get_write_fd_handle(). As soon as it's called with both available (need not be from the start), it'd drain its existing buffer then delegate to the returned inner handles. Until then, it'd just take its usual fallback path.
File would first grab its write.get_write_fd_handle() within its splicer, and then dispatch their inner handles using more or less what they do today, just within the splicer instead of in a specialized loop. They would not require specialization to do this, only polymorphic access to their underlying descriptor. The inner state machine for the splicer could be a simple enum, as is already the case today.
TcpStream would simply implement its .get_write_fd_handle() to return self, and File would likewise return self for .get_read_fd_handle(). They both implement the needed traits, so their implementations can be trivial.

This requires zero specialization at all, and should avoid combinatorial explosion. Generics can expand to optimal code, and &dyn Read + &dyn Write pairs can still get to the most optimal copy path. A trait object is ultimately used, but it's only used when you're ready to optimize for file descriptors, and almost all that time will be spent waiting for the syscall anyways.

Hopefully, this explains my intuition.

As a note, several of the motivating issues could be addressed in io::copy by a few additional specializations. Supporting non-blocking file descriptors could be done after specialization is stabilized by adding an API guarantee that no data gets dropped on WouldBlock if you wrap the writer side in a BufWriter.

The mention of non-blocking descriptors and such was in relation to libraries and frameworks like Tokio and async-std, not about the standard library's own use of them. It's intended to be a use case for this extensibility outside the standard library.

the8472 · 2023-01-13T20:15:11Z

read.get_read_fd_handle()/write.get_write_fd_handle()

Well, those weren't covered in your initial proposal. That suggests to me that there may be other missing pieces to make this work. Also note that AsFd is platform-specific. So if those are part of the public API then we'd now need std::os extension traits for those types and the optimized Splicer impls need to be platform-specific too.
This is of course already true for the currently linux-specific optimization (although it could be extended to some other unixes) but still should be explained in the proposal since it would not only affect the implementation but also the API.

Take starts off by reading up to N bytes, just to consume it, then just delegates to its inner reader's or writer's splicer according to std::io::splicer.

That would be a hypothetical Skip, not Take. Take limits the amount that can be read; that needs to be tracked and updated properly. Take a look at the API needed by existing specializations.

BufWriter would return its inner write.get_write_fd_handle() as its own provided it doesn't have any buffered data,

Well, the current implementation supports the optimizations even when it is non-empty by draining it first and only then doing the optimized transfer.

Additionally I'm still not quite seeing how this would work on the type level. Which chain of splicer_to and splicer_from gets called, what's the ultimate splicer returned by all that and how does it have the knowledge about the entire type structure to prepare it at the beginning and bring it into a clean state at the end.

Granted, the current implementation leans a lot on having access to internals of all the types as needed because they're all part of the standard library and a generic implementation might have to give up a bit here and there. But that kind of consideration needs to be included if you propose to existing io::copy implementation.

Imo the API is complex enough that it makes sense to prototype it outside std, as happened with the io-lifetimes crate prototyping the AsFd API.

dead-claudia · 2023-01-15T04:32:26Z

@the8472 I'm focusing on the high level functionality here, since it's such a complex flow (and really, unavoidably so).

I'll see about for now, prototyping this in an external crate, and then coming back once I've actually got something, since you seem to want details almost at the concrete code level before considering anything down that vein.

the8472 · 2023-01-15T10:41:27Z

Note that I'm not in charge of approving or rejecting ACPs. But I am familiar with those particular APIs. So as far as process goes you should only take my critique as advisory.

For existing ACPs of this complexity we usually had a near complete API surface definition. An implementation is not required but that something like get_read_fd_handle was missing makes me wonder how incomplete the proposed API is or whether it would be able to replicate the existing behavior at all and whether it would do so ergonomically.

If it works and is ergonomic then it would seem like a useful abstraction to me that may have its place in the standard library as a common API between libraries.

The mention of non-blocking descriptors and such was in relation to libraries and frameworks like Tokio and async-std, not about the standard library's own use of them. It's intended to be a use case for this extensibility outside the standard library.

If hypothetically specialization were stable and if we supported WouldBlock in io::copy then maybe we could provide most of the benefit by specializing on AsFd and libraries would have to implement AsFd for their types and no new API surface would be needed.
But not much progress has been made on full specialization so this might take a long time to realize.

dead-claudia added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jan 9, 2023

dead-claudia closed this as completed Jan 15, 2023

the8472 mentioned this issue Apr 3, 2023

Support for non-blocking and best-effort zero-copy io::copy #202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I/O splicing #160

I/O splicing #160

dead-claudia commented Jan 9, 2023 •

edited

Loading

the8472 commented Jan 9, 2023

dead-claudia commented Jan 9, 2023

the8472 commented Jan 9, 2023 •

edited

Loading

dead-claudia commented Jan 13, 2023 •

edited

Loading

the8472 commented Jan 13, 2023

dead-claudia commented Jan 15, 2023

the8472 commented Jan 15, 2023

I/O splicing #160

I/O splicing #160

Comments

dead-claudia commented Jan 9, 2023 • edited Loading

Proposal

Problem statement

Motivation, use-cases

Solution sketches

Links and related work

What happens now?

the8472 commented Jan 9, 2023

dead-claudia commented Jan 9, 2023

the8472 commented Jan 9, 2023 • edited Loading

dead-claudia commented Jan 13, 2023 • edited Loading

the8472 commented Jan 13, 2023

dead-claudia commented Jan 15, 2023

the8472 commented Jan 15, 2023

dead-claudia commented Jan 9, 2023 •

edited

Loading

the8472 commented Jan 9, 2023 •

edited

Loading

dead-claudia commented Jan 13, 2023 •

edited

Loading