Skip to content

Commit 9197039

Browse files
committedJan 13, 2015
Amend RFC 517: Revisions to reader/writer, core::io and std::io
1 parent e0999be commit 9197039

File tree

1 file changed

+399
-6
lines changed

1 file changed

+399
-6
lines changed
 

‎text/0517-io-os-reform.md

+399-6
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,23 @@ follow-up PRs against this RFC.
4242
* [Relation to the system-level APIs]
4343
* [Platform-specific opt-in]
4444
* [Proposed organization]
45-
* [Revising `Reader` and `Writer`] (stub)
45+
* [Revising `Reader` and `Writer`]
46+
* [Nonatomic results]
47+
* [Reader]
48+
* [Writer]
4649
* [String handling] (stub)
4750
* [Deadlines] (stub)
4851
* [Splitting streams and cancellation] (stub)
4952
* [Modules]
50-
* [core::io] (stub)
51-
* [The std::io facade] (stub)
53+
* [core::io]
54+
* [Adapters]
55+
* [Seeking]
56+
* [Buffering]
57+
* [MemReader and MemWriter]
58+
* [The std::io facade]
59+
* [Errors]
60+
* [Channel adapters]
61+
* [stdin, stdout, stderr]
5262
* [std::env] (stub)
5363
* [std::fs] (stub)
5464
* [std::net] (stub)
@@ -447,7 +457,173 @@ counts, arguments to `main`, and so on).
447457
## Revising `Reader` and `Writer`
448458
[Revising `Reader` and `Writer`]: #revising-reader-and-writer
449459

450-
> To be added in a follow-up PR.
460+
The `Reader` and `Writer` traits are the backbone of IO, representing
461+
the ability to (respectively) pull bytes from and push bytes to an IO
462+
object. The core operations provided by these traits follows a very
463+
long tradition for blocking IO, but they are still surprisingly subtle
464+
-- and they need to be revised.
465+
466+
* **Atomicity and data loss**. As discussed
467+
[above](#atomicity-and-the-reader-writer-traits), the `Reader` and
468+
`Writer` traits currently expose methods that involve multiple
469+
actual reads or writes, and data is lost when an error occurs after
470+
some (but not all) operations have completed.
471+
472+
The proposed strategy for `Reader` operations is to return the
473+
already-read data together with an error. For writers, the main
474+
change is to make `write` only perform a single underlying write
475+
(returning the number of bytes written on success), and provide a
476+
separate `write_all` method.
477+
478+
* **Parsing/serialization**. The `Reader` and `Writer` traits
479+
currently provide a large number of default methods for
480+
(de)serialization of various integer types to bytes with a given
481+
endianness. Unfortunately, these operations pose atomicity problems
482+
as well (e.g., a read could fail after reading two of the bytes
483+
needed for a `u32` value).
484+
485+
Rather than complicate the signatures of these methods, the
486+
(de)serialization infrastructure is removed entirely -- in favor of
487+
instead eventually introducing a much richer
488+
parsing/formatting/(de)serialization framework that works seamlessly
489+
with `Reader` and `Writer`.
490+
491+
Such a framework is out of scope for this RFC, but the
492+
endian-sensitive functionality will be provided elsewhere
493+
(likely out of tree).
494+
495+
* **The error type**. The traits currently use `IoResult` in their
496+
return types, which ties them to `IoError` in particular. Besides
497+
being an unnecessary restriction, this type prevents `Reader` and
498+
`Writer` (and various adapters built on top of them) from moving to
499+
`libcore` -- `IoError` currently requires the `String` type.
500+
501+
With associated types, there is essentially no downside in making
502+
the error type generic.
503+
504+
With those general points out of the way, let's look at the details.
505+
506+
### Nonatomic results
507+
[Nonatomic results]: #nonatomic-results
508+
509+
To clarity dealing with nonatomic operations and improve their
510+
ergonomics, we introduce some new types into `std::error`:
511+
512+
```rust
513+
// The progress so far (T) paired with an err (Err)
514+
struct PartialResult<T, Err>(T, Err);
515+
516+
// An operation that may fail after having made some progress:
517+
// - S is what's produced on complete success,
518+
// - T is what's produced if an operation fails part of the way through
519+
type NonatomicResult<S, T, Err> = Result<S, PartialResult<T, Err>>;
520+
521+
// Ergonomically throw out the partial result
522+
impl<T, Err> FromError<PartialResult<T, Err> for Err { ... }
523+
```
524+
525+
The `NonatomicResult` type (which could use a shorter name)
526+
encapsulates the common pattern of operations that may fail after
527+
having made some progress. The `PartialResult` type then returns the
528+
progress that was made along with the error, but with a `FromError`
529+
implementation that makes it trivial to throw out the partial result
530+
if desired.
531+
532+
### `Reader`
533+
[Reader]: #reader
534+
535+
The updated `Reader` trait (and its extension) is as follows:
536+
537+
```rust
538+
trait Reader {
539+
type Err; // new associated error type
540+
541+
// unchanged except for error type
542+
fn read(&mut self, buf: &mut [u8]) -> Result<uint, Err>;
543+
544+
// these all return partial results on error
545+
fn read_to_end(&mut self) -> NonatomicResult<Vec<u8>, Vec<u8>, Err> { ... }
546+
fn read_to_string(&self) -> NonatomicResult<String, Vec<u8>, Err> { ... }
547+
fn read_at_least(&mut self, min: uint, buf: &mut [u8]) -> NonatomicResult<uint, uint, Err> { ... }
548+
}
549+
550+
// extension trait needed for object safety
551+
trait ReaderExt: Reader {
552+
fn bytes(&mut self) -> Bytes<Self> { ... }
553+
fn chars<'r>(&'r mut self) -> Chars<'r, Self, Err> { ... }
554+
555+
... // more to come later in the RFC
556+
}
557+
impl<R: Reader> ReaderExt for R {}
558+
```
559+
560+
#### Removed methods
561+
562+
The proposed `Reader` trait is much slimmer than today's. The vast
563+
majority of removed methods are parsing/deserialization, which were
564+
discussed above.
565+
566+
The remaining methods (`read_exact`, `push`, `push_at_least`) were
567+
removed largely because they are *not memory safe*: they involve
568+
extending a vector's capacity, and then *passing in the resulting
569+
uninitialized memory* to the `read` method, which is not marked
570+
`unsafe`! Thus the current design can lead to undefined behavior in
571+
safe code.
572+
573+
The solution is to instead extend `Vec<T>` with a useful unsafe method:
574+
575+
```rust
576+
unsafe fn with_extra(&mut self, n: uint) -> &mut [T];
577+
```
578+
579+
This method is equivalent to calling `reserve(n)` and then providing a
580+
slice to the memory starting just after `len()` entries. Using this
581+
method, clients of `Reader` can easily recover the above removed
582+
methods, but they are explicitly marking the unsafety of doing so.
583+
584+
(Note: `read_to_end` is currently not memory safe for the same reason,
585+
but is considered a very important convenience. Thus, we will continue
586+
to provide it, but will zero the slice beforehand.)
587+
588+
### `Writer`
589+
[Writer]: #writer
590+
591+
The `Writer` trait is cut down to even smaller size:
592+
593+
```rust
594+
trait Writer {
595+
type Err;
596+
fn write(&mut self, buf: &[u8]) -> Result<uint, Err>;
597+
598+
fn write_all(&mut self, buf: &[u8]) -> NonatomicResult<(), uint, Err> { ... };
599+
fn write_fmt(&mut self, fmt: &fmt::Arguments) -> Result<(), Err> { ... }
600+
fn flush(&mut self) -> Result<(), Err> { ... }
601+
}
602+
```
603+
604+
The biggest change here is to the semantics of `write`. Instead of
605+
repeatedly writing to the underlying IO object until all of `buf` is
606+
written, it attempts a *single* write and on success returns the
607+
number of bytes written. This follows the long tradition of blocking
608+
IO, and is a more fundamental building block than the looping write we
609+
currently have.
610+
611+
For convenience, `write_all` recovers the behavior of today's `write`,
612+
looping until either the entire buffer is written or an error
613+
occurs. In the latter case, however, it now also yields the number of
614+
bytes that had been written prior to the error.
615+
616+
The `write_fmt` method, like `write_all`, will loop until its entire
617+
input is written or an error occurs. However, it does not return a
618+
`NonatomicResult` because the number of bytes written cannot be
619+
straightforwardly interpreted -- the actual byte sequence written is
620+
determined by the formatting system.
621+
622+
The other methods include endian conversions (covered by
623+
serialization) and a few conveniences like `write_str` for other basic
624+
types. The latter, at least, is already uniformly (and extensibly)
625+
covered via the `write!` macro. The other helpers, as with `Reader`,
626+
should migrate into a more general (de)serialization library.
451627

452628
## String handling
453629
[String handling]: #string-handling
@@ -473,12 +649,229 @@ throughout IO, we can go on to explore the modules in detail.
473649
### `core::io`
474650
[core::io]: #coreio
475651

476-
> To be added in a follow-up PR.
652+
The `io` module is split into a the parts that can live in `libcore`
653+
(most of it) and the parts that are added in the `std::io`
654+
facade. Being able to move components into `libcore` at all is made
655+
possible through the use of
656+
[associated error types](#revising-reader-and-writer) for `Reader` and
657+
`Writer`.
658+
659+
#### Adapters
660+
[Adapters]: #adapters
661+
662+
The current `std::io::util` module offers a number of `Reader` and
663+
`Writer` "adapters". This RFC refactors the design to more closely
664+
follow `std::iter`. Along the way, it generalizes the `by_ref` adapter:
665+
666+
```rust
667+
trait ReaderExt: Reader {
668+
// already introduced above
669+
fn bytes(&mut self) -> Bytes<Self> { ... }
670+
fn chars<'r>(&'r mut self) -> Chars<'r, Self, Err> { ... }
671+
672+
// Reify a borrowed reader as owned
673+
fn by_ref<'a>(&'a mut self) -> ByRef<'a, Self> { ... }
674+
675+
// Read everything from `self`, then read from `next`
676+
fn chain<R: Reader>(self, next: R) -> Chain<Self, R> { ... }
677+
678+
// Adapt `self` to yield only the first `limit` bytes
679+
fn take(self, limit: u64) -> Take<Self> { ... }
680+
681+
// Whenever reading from `self`, push the bytes read to `out`
682+
fn tee<W: Writer>(self, out: W) -> Tee<Self, W> { ... }
683+
}
684+
impl<T: Reader> ReaderExt for T {}
685+
686+
trait WriterExt: Writer {
687+
// Reify a borrowed writer as owned
688+
fn by_ref<'a>(&'a mut self) -> ByRef<'a, Self> { ... }
689+
690+
// Whenever bytes are written to `self`, write them to `other` as well
691+
fn carbon_copy<W: Writer>(self, other: W) -> CarbonCopy<Self, W> { ... }
692+
}
693+
impl<T: Writer> WriterExt for T {}
694+
695+
// An adaptor converting an `Iterator<u8>` to a `Reader`.
696+
pub struct IterReader<T> { ... }
697+
```
698+
699+
As with `std::iter`, these adapters are object unsafe an hence placed
700+
in an extension trait with a blanket `impl`.
701+
702+
Note that the same `ByRef` type is used for both `Reader` and `Writer`
703+
-- and this RFC proposes to use it for `std::iter` as well. The
704+
insight is that there is no difference between the *type* used for
705+
by-ref adapters in any of these cases; what changes is just which
706+
trait defers through it. So, we propose to add the following to `core::borrow`:
707+
708+
```rust
709+
pub struct ByRef<'a, Sized? T:'a> {
710+
pub inner: &'a mut T
711+
}
712+
```
713+
714+
which will allow `impl`s like the following in `core::io`:
715+
716+
```rust
717+
impl<'a, W: Writer> Writer for ByRef<'a, W> {
718+
#[inline]
719+
fn write(&mut self, buf: &[u8]) -> Result<uint, W::Err> { self.inner.write(buf) }
720+
721+
#[inline]
722+
fn flush(&mut self) -> Result<(), W::Err> { self.inner.flush() }
723+
}
724+
```
725+
726+
#### Free functions
727+
[Free functions]: #free-functions
728+
729+
The current `std::io::util` module also includes a number of primitive
730+
readers and writers, as well as `copy`. These are updated as follows:
731+
732+
```rust
733+
// A reader that yields no bytes
734+
fn empty() -> Empty;
735+
736+
// A reader that yields `byte` repeatedly (generalizes today's ZeroReader)
737+
fn repeat(byte: u8) -> Repeat;
738+
739+
// A writer that ignores the bytes written to it (/dev/null)
740+
fn sink() -> Sink;
741+
742+
// Copies all data from a Reader to a Writer
743+
pub fn copy<E, R, W>(r: &mut R, w: &mut W) -> NonatomicResult<(), uint, E> where
744+
R: Reader<Err = E>,
745+
W: Writer<Err = E>
746+
```
747+
748+
#### Seeking
749+
[Seeking]: #seeking
750+
751+
The seeking infrastructure is largely the same as today's, except that
752+
`tell` is renamed to follow the RFC's design principles and the `seek`
753+
signature is refactored with more precise types:
754+
755+
```rust
756+
pub trait Seek {
757+
type Err;
758+
fn position(&self) -> Result<u64, Err>;
759+
fn seek(&mut self, pos: SeekPos) -> Result<(), Err>;
760+
}
761+
762+
pub enum SeekPos {
763+
FromStart(u64),
764+
FromEnd(u64),
765+
FromCur(i64),
766+
}
767+
```
768+
769+
#### Buffering
770+
[Buffering]: #buffering
771+
772+
The current `Buffer` trait will be renamed to `BufferedReader` for
773+
clarity (and to open the door to `BufferedWriter` at some later
774+
point):
775+
776+
```rust
777+
pub trait BufferedReader: Reader {
778+
fn fill_buf(&mut self) -> Result<&[u8], Self::Err>;
779+
fn consume(&mut self, amt: uint);
780+
781+
// This should perhaps yield an iterator
782+
fn read_until(&mut self, byte: u8) -> NonatomicResult<Vec<u8>, Vec<u8>, Self::Err> { ... }
783+
}
784+
785+
pub trait BufferedReaderExt: BufferedReader {
786+
fn lines(&mut self) -> Lines<Self, Self::Err> { ... };
787+
}
788+
```
789+
790+
In addition, `read_line` is removed in favor of the `lines` iterator,
791+
and `read_char` is removed in favor of the `chars` iterator (now on
792+
`ReaderExt`). These iterators will be changed to yield
793+
`NonatomicResult` values.
794+
795+
The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay
796+
essentially as they are today, except that for streams and writers the
797+
`into_inner` method yields any errors encountered when flushing,
798+
together with the remaining data:
799+
800+
```rust
801+
// If flushing fails, you get the unflushed data back
802+
fn into_inner(self) -> NonatomicResult<W, Vec<u8>, W::Err>;
803+
```
804+
805+
#### `MemReader` and `MemWriter`
806+
[MemReader and MemWriter]: #memreader-and-memwriter
807+
808+
The various in-memory readers and writers available today will be
809+
consolidated into just `MemReader` and `MemWriter`:
810+
811+
`MemReader` (like today's `BufReader`)
812+
- construct from `&[u8]`
813+
- implements `Seek`
814+
815+
`MemWriter`
816+
- construct freshly, or from a `Vec`
817+
- implements `Seek`
818+
819+
Both will allow decomposing into their inner parts, though the exact
820+
details are left to the implementation.
821+
822+
The rationale for this design is that, if you want to read from a
823+
`Vec`, it's easy enough to get a slice to read from instead; on the
824+
other hand, it's rare to want to write into a mutable slice on the
825+
stack, as opposed to an owned vector. So these two readers and writers
826+
cover the vast majority of in-memory readers/writers for Rust.
827+
828+
In addition to these, however, we will have the following `impl`s
829+
directly on slice/vector types:
830+
831+
* `impl Writer for Vec<u8>`
832+
* `impl Writer for &mut [u8]`
833+
* `impl Reader for &[u8]`
834+
835+
These `impls` are convenient and efficient, but do not implement `Seek`.
477836

478837
### The `std::io` facade
479838
[The std::io facade]: #the-stdio-facade
480839

481-
> To be added in a follow-up PR.
840+
The `std::io` module will largely be a facade over `core::io`, but it
841+
will add some functionality that can live only in `std`.
842+
843+
#### `Errors`
844+
[Errors]: #error
845+
846+
The `IoError` type will be renamed to `std::io::Error`, following our
847+
[non-prefixing convention](https://github.com/rust-lang/rfcs/pull/356).
848+
It will remain largely as it is today, but its fields will be made
849+
private. It may eventually grow a field to track the underlying OS
850+
error code.
851+
852+
The `IoErrorKind` type will become `std::io::ErrorKind`, and
853+
`ShortWrite` will be dropped (it is no longer needed with the new
854+
`Writer` semantics), which should decrease its footprint. The
855+
`OtherIoError` variant will become `Other` now that `enum`s are
856+
namespaced.
857+
858+
#### Channel adapters
859+
[Channel adapters]: #channel-adapters
860+
861+
The `ChanReader` and `ChanWriter` adapters will be kept exactly as they are today.
862+
863+
#### `stdin`, `stdout`, `stderr`
864+
[stdin, stdout, stderr]: #stdin-stdout-stderr
865+
866+
Finally, `std::io` will provide a `stdin` reader and `stdout` and
867+
`stderr` writers. These will largely work as they do today, except
868+
that we will hew more closely to the traditional setup:
869+
870+
* `stderr` will be unbuffered and `stderr_raw` will therefore be dropped.
871+
* `stdout` will be line-buffered for TTY, fully buffered otherwise.
872+
* most TTY functionality in `StdReader` and `StdWriter` will be moved
873+
to `os::unix`, since it's not yet implemented on Windows.
874+
* `stdout_raw` and `stderr_raw` will be removed.
482875

483876
### `std::env`
484877
[std::env]: #stdenv

0 commit comments

Comments
 (0)
Please sign in to comment.