- Start Date: 2014-12-07
- RFC PR: rust-lang/rfcs#517
- Rust Issue: rust-lang/rust#21070
This RFC proposes a significant redesign of the std::io
and std::os
modules
in preparation for API stabilization. The specific problems addressed by the
redesign are given in the Problems section below, and the key ideas of the
design are given in Vision for IO.
This RFC was originally posted as a single monolithic file, which made it difficult to discuss different parts separately.
It has now been split into a skeleton that covers (1) the problem
statement, (2) the overall vision and organization, and (3) the
std::os
module.
Other parts of the RFC are marked with (stub)
and will be filed as
follow-up PRs against this RFC.
- Summary
- Table of contents
- Problems
- Detailed design
- Drawbacks
- Alternatives
- Unresolved questions
The io
and os
modules are the last large API surfaces of std
that need to
be stabilized. While the basic functionality offered in these modules is
largely traditional, many problems with the APIs have emerged over time. The
RFC discusses the most significant problems below.
This section only covers specific problems with the current library; see Vision for IO for a higher-level view. section.
One of the most pressing -- but also most subtle -- problems with std::io
is
the lack of atomicity in its Reader
and Writer
traits.
For example, the Reader
trait offers a read_to_end
method:
fn read_to_end(&mut self) -> IoResult<Vec<u8>>
Executing this method may involve many calls to the underlying read
method. And it is possible that the first several calls succeed, and then a call
returns an Err
-- which, like TimedOut
, could represent a transient
problem. Unfortunately, given the above signature, there is no choice but to
simply throw this data away.
The Writer
trait suffers from a more fundamental problem, since its primary
method, write
, may actually involve several calls to the underlying system --
and if a failure occurs, there is no indication of how much was written.
Existing blocking APIs all have to deal with this problem, and Rust
can and should follow the existing tradition here. See
Revising Reader
and Writer
for the proposed solution.
The std::io
module supports "timeouts" on virtually all IO objects via a
set_timeout
method. In this design, every IO object (file, socket, etc.) has
an optional timeout associated with it, and set_timeout
mutates the associated
timeout. All subsequent blocking operations are implicitly subject to this timeout.
This API choice suffers from two problems, one cosmetic and the other deeper:
-
The "timeout" is actually a deadline and should be named accordingly.
-
The stateful API has poor composability: when passing a mutable reference of an IO object to another function, it's possible that the deadline has been changed. In other words, users of the API can easily interfere with each other by accident.
See Deadlines for the proposed solution.
The current io
and os
modules were originally designed when librustuv
was
providing IO support, and to some extent they reflect the capabilities and
conventions of libuv
-- which in turn are loosely based on Posix.
As such, the modules are not always ideal from a cross-platform standpoint, both in terms of forcing Windows programmings into a Posix mold, and also of offering APIs that are not actually usable on all platforms.
The modules have historically also provided no platform-specific APIs.
Part of the goal of this RFC is to set out a clear and extensible story for both
cross-platform and platform-specific APIs in std
. See Design principles for
the details.
Rust has followed the utf8 everywhere approach to its strings. However, at the borders to platform APIs, it is revealed that the world is not, in fact, UTF-8 (or even Unicode) everywhere.
Currently our story for platform APIs is that we either assume they can take or
return Unicode strings (suitably encoded) or an uninterpreted byte
sequence. Sadly, this approach does not actually cover all platform needs, and
is also not highly ergonomic as presently implemented. (Consider os::getenv
which introduces replacement characters (!) versus os::getenv_as_bytes
which
yields a Vec<u8>
; neither is ideal.)
This topic was covered in some detail in the Path Reform RFC, but this RFC gives a more general account in String handling.
The stdio
module provides access to readers/writers for stdin
, stdout
and
stderr
, which is essential functionality. However, it also provides a means
of changing e.g. "stdout" -- but there is no connection between these two! In
particular, set_stdout
affects only the writer that println!
and friends
use, while set_stderr
affects panic!
.
This module needs to be clarified. See The std::io facade and [Functionality moved elsewhere] for the detailed design.
There are a few places where io
provides high-level abstractions over system
services without also providing more direct access to the service as-is. For example:
-
The
Writer
trait'swrite
method -- a cornerstone of IO -- actually corresponds to an unbounded number of invocations of writes to the underlying IO object. This RFC changeswrite
to follow more standard, lower-level practice; see RevisingReader
andWriter
. -
Objects like
TcpStream
areClone
, which involves a fair amount of supporting infrastructure. This RFC tackles the problems thatClone
was trying to solve more directly; see Splitting streams and cancellation.
The motivation for going lower-level is described in Design principles below.
The std::io
module is somewhat unusual in that most of the functionality it
proves are used through a few key traits (like Reader
) and these traits are in
turn "lifted" over IoResult
:
impl<R: Reader> Reader for IoResult<R> { ... }
This lifting and others makes it possible to chain IO operations that might produce errors, without any explicit mention of error handling:
File::open(some_path).read_to_end()
^~~~~~~~~~~ can produce an error
^~~~ can produce an error
The result of such a chain is either Ok
of the outcome, or Err
of the first
error.
While this pattern is highly ergonomic, it does not fit particularly well into
our evolving error story
(interoperation or
try blocks), and it is the only
module in std
to follow this pattern.
Eventually, we would like to write
File::open(some_path)?.read_to_end()
to take advantage of the FromError
infrastructure, hook into error handling
control flow, and to provide good chaining ergonomics throughout all Rust APIs
-- all while keeping this handling a bit more explicit via the ?
operator. (See rust-lang#243 for the rough direction).
In the meantime, this RFC proposes to phase out the use of impls for
IoResult
. This will require use of try!
for the time being.
(Note: this may put some additional pressure on at least landing the basic use
of ?
instead of today's try!
before 1.0 final.)
There's a lot of material here, so the RFC starts with high-level goals, principles, and organization, and then works its way through the various modules involved.
Rust's IO story had undergone significant evolution, starting from a
libuv
-style pure green-threaded model to a dual green/native model and now to
a pure native model. Given that
history, it's worthwhile to set out explicitly what is, and is not, in scope for
std::io
For Rust 1.0, the aim is to:
-
Provide a blocking API based directly on the services provided by the native OS for native threads.
These APIs should cover the basics (files, basic networking, basic process management, etc) and suffice to write servers following the classic Apache thread-per-connection model. They should impose essentially zero cost over the underlying OS services; the core APIs should map down to a single syscall unless more are needed for cross-platform compatibility.
-
Provide basic blocking abstractions and building blocks (various stream and buffer types and adapters) based on traditional blocking IO models but adapted to fit well within Rust.
-
Provide hooks for integrating with low-level and/or platform-specific APIs.
-
Ensure reasonable forwards-compatibility with future async IO models.
It is explicitly not a goal at this time to support asynchronous programming models or nonblocking IO, nor is it a goal for the blocking APIs to eventually be used in a nonblocking "mode" or style.
Rather, the hope is that the basic abstractions of files, paths, sockets, and so on will eventually be usable directly within an async IO programing model and/or with nonblocking APIs. This is the case for most existing languages, which offer multiple interoperating IO models.
The long term intent is certainly to support async IO in some form, but doing so will require new research and experimentation.
Now that the scope has been clarified, it's important to lay out some broad
principles for the io
and os
modules. Many of these principles are already
being followed to some extent, but this RFC makes them more explicit and applies
them more uniformly.
Historically, Rust's std
has always been "cross-platform", but as discussed in
Posix and libuv bias this hasn't always played out perfectly. The proposed
policy is below. With this policies, the APIs should largely feel like part of
"Rust" rather than part of any legacy, and they should enable truly portable
code.
Except for an explicit opt-in (see Platform-specific opt-in below), all APIs
in std
should be cross-platform:
-
The APIs should only expose a service or a configuration if it is supported on all platforms, and if the semantics on those platforms is or can be made loosely equivalent. (The latter requires exercising some judgment). Platform-specific functionality can be handled separately (Platform-specific opt-in) and interoperate with normal
std
abstractions.This policy rules out functions like
chown
which have a clear meaning on Unix and no clear interpretation on Windows; the ownership and permissions models are very different. -
The APIs should follow Rust's conventions, including their naming, which should be platform-neutral.
This policy rules out names like
fstat
that are the legacy of a particular platform family. -
The APIs should never directly expose the representation of underlying platform types, even if they happen to coincide on the currently-supported platforms. Cross-platform types in
std
should be newtyped.This policy rules out exposing e.g. error numbers directly as an integer type.
The next subsection gives detail on what these APIs should look like in relation to system services.
How should Rust APIs map into system services? This question breaks down along several axes which are in tension with one another:
-
Guarantees. The APIs provided in the mainline
io
modules should be predominantly safe, aside from the occasionalunsafe
function. In particular, the representation should be sufficiently hidden that most use cases are safe by construction. Beyond memory safety, though, the APIs should strive to provide a clear multithreaded semantics (using theSend
/Sync
kinds), and should use Rust's type system to rule out various kinds of bugs when it is reasonably ergonomic to do so (following the usual Rust conventions). -
Ergonomics. The APIs should present a Rust view of things, making use of the trait system, newtypes, and so on to make system services fit well with the rest of Rust.
-
Abstraction/cost. On the other hand, the abstractions introduced in
std
must not induce significant costs over the system services -- or at least, there must be a way to safely access the services directly without incurring this penalty. When useful abstractions would impose an extra cost, they must be pay-as-you-go.
Putting the above bullets together, the abstractions must be safe, and they should be as high-level as possible without imposing a tax.
- Coverage. Finally, the
std
APIs should over time strive for full coverage of non-niche, cross-platform capabilities.
Rust is a systems language, and as such it should expose seamless, no/low-cost access to system services. In many cases, however, this cannot be done in a cross-platform way, either because a given service is only available on some platforms, or because providing a cross-platform abstraction over it would be costly.
This RFC proposes platform-specific opt-in: submodules of os
that are named
by platform, and made available via #[cfg]
switches. For example, os::unix
can provide APIs only available on Unix systems, and os::linux
can drill
further down into Linux-only APIs. (You could even imagine subdividing by OS
versions.) This is "opt-in" in the sense that, like the unsafe
keyword, it is
very easy to audit for potential platform-specificity: just search for
os::anyplatform
. Moreover, by separating out subsets like linux
, it's clear
exactly how specific the platform dependency is.
The APIs in these submodules are intended to have the same flavor as other io
APIs and should interoperate seamlessly with cross-platform types, but:
-
They should be named according to the underlying system services when there is a close correspondence.
-
They may reveal the underlying OS type if there is nothing to be gained by hiding it behind an abstraction.
For example, the os::unix
module could provide a stat
function that takes a
standard Path
and yields a custom struct. More interestingly, os::linux
might include an epoll
function that could operate directly on many io
types (e.g. various socket types), without any explicit conversion to a file
descriptor; that's what "seamless" means.
Each of the platform modules will offer a custom prelude
submodule,
intended for glob import, that includes all of the extension traits
applied to standard IO objects.
The precise design of these modules is in the very early stages and will likely
remain #[unstable]
for some time.
The io
module is currently the biggest in std
, with an entire hierarchy
nested underneath; it mixes general abstractions/tools with specific IO objects.
The os
module is currently a bit of a dumping ground for facilities that don't
fit into the io
category.
This RFC proposes the revamp the organization by flattening out the hierarchy and clarifying the role of each module:
std
env environment manipulation
fs file system
io core io abstractions/adapters
prelude the io prelude
net networking
os
unix platform-specific APIs
linux ..
windows ..
os_str platform-sensitive string handling
process process management
In particular:
-
The contents of
os
will largely move toenv
, a new module for inspecting and updating the "environment" (including environment variables, CPU counts, arguments tomain
, and so on). -
The
io
module will include things likeReader
andBufferedWriter
-- cross-cutting abstractions that are needed throughout IO.The
prelude
submodule will export all of the traits and most of the types for IO-related APIs; a single glob import should suffice to set you up for working with IO. (Note: this goes hand-in-hand with removing the bits ofio
currently in the prelude, as recently proposed.) -
The root
os
module is used purely to house the platform submodules discussed above. -
The
os_str
module is part of the solution to the Unicode problem; see String handling below. -
The
process
module over time will grow to include querying/manipulating already-running processes, not just spawning them.
The Reader
and Writer
traits are the backbone of IO, representing
the ability to (respectively) pull bytes from and push bytes to an IO
object. The core operations provided by these traits follows a very
long tradition for blocking IO, but they are still surprisingly subtle
-- and they need to be revised.
-
Atomicity and data loss. As discussed above, the
Reader
andWriter
traits currently expose methods that involve multiple actual reads or writes, and data is lost when an error occurs after some (but not all) operations have completed.The proposed strategy for
Reader
operations is to (1) separate out various deserialization methods into a distinct framework, (2) never have the internalread
implementations loop on errors, (3) cut down on the number of non-atomic read operations and (4) adjust the remaining operations to provide more flexibility when possible.For writers, the main change is to make
write
only perform a single underlying write (returning the number of bytes written on success), and provide a separatewrite_all
method. -
Parsing/serialization. The
Reader
andWriter
traits currently provide a large number of default methods for (de)serialization of various integer types to bytes with a given endianness. Unfortunately, these operations pose atomicity problems as well (e.g., a read could fail after reading two of the bytes needed for au32
value).Rather than complicate the signatures of these methods, the (de)serialization infrastructure is removed entirely -- in favor of instead eventually introducing a much richer parsing/formatting/(de)serialization framework that works seamlessly with
Reader
andWriter
.Such a framework is out of scope for this RFC, but the endian-sensitive functionality will be provided elsewhere (likely out of tree).
With those general points out of the way, let's look at the details.
The updated Reader
trait (and its extension) is as follows:
trait Read {
fn read(&mut self, buf: &mut [u8]) -> Result<usize, Error>;
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> Result<(), Error> { ... }
fn read_to_string(&self, buf: &mut String) -> Result<(), Error> { ... }
}
// extension trait needed for object safety
trait ReadExt: Read {
fn bytes(&mut self) -> Bytes<Self> { ... }
... // more to come later in the RFC
}
impl<R: Read> ReadExt for R {}
Following the
trait naming conventions,
the trait is renamed to Read
reflecting the clear primary method it
provides.
The read
method should not involve internal looping (even over
errors like EINTR
). It is intended to faithfully represent a single
call to an underlying system API.
The read_to_end
and read_to_string
methods now take explicit
buffers as input. This has multiple benefits:
-
Performance. When it is known that reading will involve some large number of bytes, the buffer can be preallocated in advance.
-
"Atomicity" concerns. For
read_to_end
, it's possible to use this API to retain data collected so far even when aread
fails in the middle. Forread_to_string
, this is not the case, because UTF-8 validity cannot be ensured in such cases; but if intermediate results are wanted, one can useread_to_end
and convert to aString
only at the end.
Convenience methods like these will retry on EINTR
. This is partly
under the assumption that in practice, EINTR will most often arise
when interfacing with other code that changes a signal handler. Due to
the global nature of these interactions, such a change can suddenly
cause your own code to get an error irrelevant to it, and the code
should probably just retry in those cases. In the case where you are
using EINTR explicitly, read
and write
will be available to handle
it (and you can always build your own abstractions on top).
The proposed Read
trait is much slimmer than today's Reader
. The vast
majority of removed methods are parsing/deserialization, which were
discussed above.
The remaining methods (read_exact
, read_at_least
, push
,
push_at_least
) were removed for various reasons:
-
read_exact
,read_at_least
: these are somewhat more obscure conveniences that are not particularly robust due to lack of atomicity. -
push
,push_at_least
: these are special-cases for working withVec
, which this RFC proposes to replace with a more general mechanism described next.
To provide some of this functionality in a more composition way,
extend Vec<T>
with an unsafe method:
unsafe fn with_extra(&mut self, n: uint) -> &mut [T];
This method is equivalent to calling reserve(n)
and then providing a
slice to the memory starting just after len()
entries. Using this
method, clients of Read
can easily recover the push
method.
The Writer
trait is cut down to even smaller size:
trait Write {
fn write(&mut self, buf: &[u8]) -> Result<uint, Error>;
fn flush(&mut self) -> Result<(), Error>;
fn write_all(&mut self, buf: &[u8]) -> Result<(), Error> { .. }
fn write_fmt(&mut self, fmt: &fmt::Arguments) -> Result<(), Error> { .. }
}
The biggest change here is to the semantics of write
. Instead of
repeatedly writing to the underlying IO object until all of buf
is
written, it attempts a single write and on success returns the
number of bytes written. This follows the long tradition of blocking
IO, and is a more fundamental building block than the looping write we
currently have. Like read
, it will propagate EINTR.
For convenience, write_all
recovers the behavior of today's write
,
looping until either the entire buffer is written or an error
occurs. To meaningfully recover from an intermediate error and keep
writing, code should work with write
directly. Like the Read
conveniences, EINTR
results in a retry.
The write_fmt
method, like write_all
, will loop until its entire
input is written or an error occurs.
The other methods include endian conversions (covered by
serialization) and a few conveniences like write_str
for other basic
types. The latter, at least, is already uniformly (and extensibly)
covered via the write!
macro. The other helpers, as with Read
,
should migrate into a more general (de)serialization library.
The fundamental problem with Rust's full embrace of UTF-8 strings is that not all strings taken or returned by system APIs are Unicode, let alone UTF-8 encoded.
In the past, std
has assumed that all strings are either in some form of
Unicode (Windows), or are simply u8
sequences (Unix). Unfortunately, this is
wrong, and the situation is more subtle:
-
Unix platforms do indeed work with arbitrary
u8
sequences (without interior nulls) and today's platforms usually interpret them as UTF-8 when displayed. -
Windows, however, works with arbitrary
u16
sequences that are roughly interpreted at UTF-16, but may not actually be valid UTF-16 -- an "encoding" often called UCS-2; see http://justsolve.archiveteam.org/wiki/UCS-2 for a bit more detail.
What this means is that all of Rust's platforms go beyond Unicode, but they do so in different and incompatible ways.
The current solution of providing both str
and [u8]
versions of
APIs is therefore problematic for multiple reasons. For one, the
[u8]
versions are not actually cross-platform -- even today, they
panic on Windows when given non-UTF-8 data, a platform-specific
behavior. But they are also incomplete, because on Windows you should
be able to work directly with UCS-2 data.
Fortunately, there is a solution that fits well with Rust's UTF-8 strings and offers the possibility of platform-specific APIs.
Observation 1: it is possible to re-encode UCS-2 data in a way that is also compatible with UTF-8. This is the WTF-8 encoding format proposed by Simon Sapin. This encoding has some remarkable properties:
-
Valid UTF-8 data is valid WTF-8 data. When decoded to UCS-2, the result is exactly what would be produced by going straight from UTF-8 to UTF-16. In other words, making up some methods:
my_ut8_data.to_wtf8().to_ucs2().as_u16_slice() == my_utf8_data.to_utf16().as_u16_slice()
-
Valid UTF-16 data re-encoded as WTF-8 produces the corresponding UTF-8 data:
my_utf16_data.to_wtf8().as_bytes() == my_utf16_data.to_utf8().as_bytes()
These two properties mean that, when working with Unicode data, the WTF-8 encoding is highly compatible with both UTF-8 and UTF-16. In particular, the conversion from a Rust string to a WTF-8 string is a no-op, and the conversion in the other direction is just a validation.
Observation 2: all platforms can consume Unicode data (suitably re-encoded), and it's also possible to validate the data they produce as Unicode and extract it.
Observation 3: the non-Unicode spaces on various platforms are deeply incompatible: there is no standard way to port non-Unicode data from one to another. Therefore, the only cross-platform APIs are those that work entirely with Unicode.
The observations above lead to a somewhat radical new treatment of strings,
first proposed in the
Path Reform RFC. This RFC proposes
to introduce new string and string slice types that (opaquely) represent
platform-sensitive strings, housed in the std::os_str
module.
The OsString
type is analogous to String
, and OsStr
is analogous to str
.
Their backing implementation is platform-dependent, but they offer a
cross-platform API:
pub mod os_str {
/// Owned OS strings
struct OsString {
inner: imp::Buf
}
/// Slices into OS strings
struct OsStr {
inner: imp::Slice
}
// Platform-specific implementation details:
#[cfg(unix)]
mod imp {
type Buf = Vec<u8>;
type Slice = [u8];
...
}
#[cfg(windows)]
mod imp {
type Buf = Wtf8Buf; // See https://github.com/SimonSapin/rust-wtf8
type Slice = Wtf8;
...
}
impl OsString {
pub fn from_string(String) -> OsString;
pub fn from_str(&str) -> OsString;
pub fn as_slice(&self) -> &OsStr;
pub fn into_string(Self) -> Result<String, OsString>;
pub fn into_string_lossy(Self) -> String;
// and ultimately other functionality typically found on vectors,
// but CRUCIALLY NOT as_bytes
}
impl Deref<OsStr> for OsString { ... }
impl OsStr {
pub fn from_str(value: &str) -> &OsStr;
pub fn as_str(&self) -> Option<&str>;
pub fn to_string_lossy(&self) -> CowString;
// and ultimately other functionality typically found on slices,
// but CRUCIALLY NOT as_bytes
}
trait IntoOsString {
fn into_os_str_buf(self) -> OsString;
}
impl IntoOsString for OsString { ... }
impl<'a> IntoOsString for &'a OsStr { ... }
...
}
These APIs make OS strings appear roughly as opaque vectors (you
cannot see the byte representation directly), and can always be
produced starting from Unicode data. They make it possible to collapse
functions like getenv
and getenv_as_bytes
into a single function
that produces an OS string, allowing the client to decide how (or
whether) to extract Unicode data. It will be possible to do things
like concatenate OS strings without ever going through Unicode.
It will also likely be possible to do things like search for Unicode substrings. The exact details of the API are left open and are likely to grow over time.
In addition to APIs like the above, there will also be platform-specific ways of viewing or constructing OS strings that reveals more about the space of possible values:
pub mod os {
#[cfg(unix)]
pub mod unix {
trait OsStringExt {
fn from_vec(Vec<u8>) -> Self;
fn into_vec(Self) -> Vec<u8>;
}
impl OsStringExt for os_str::OsString { ... }
trait OsStrExt {
fn as_byte_slice(&self) -> &[u8];
fn from_byte_slice(&[u8]) -> &Self;
}
impl OsStrExt for os_str::OsStr { ... }
...
}
#[cfg(windows)]
pub mod windows{
// The following extension traits provide a UCS-2 view of OS strings
trait OsStringExt {
fn from_wide_slice(&[u16]) -> Self;
}
impl OsStringExt for os_str::OsString { ... }
trait OsStrExt {
fn to_wide_vec(&self) -> Vec<u16>;
}
impl OsStrExt for os_str::OsStr { ... }
...
}
...
}
By placing these APIs under os
, using them requires a clear opt in
to platform-specific functionality.
Introducing an additional string type is a bit daunting, since many existing APIs take and consume only standard Rust strings. Today's solution demands that strings coming from the OS be assumed or turned into Unicode, and the proposed API continues to allow that (with more explicit and finer-grained control).
In the long run, however, robust applications are likely to work
opaquely with OS strings far beyond the boundary to the system to
avoid data loss and ensure maximal compatibility. If this situation
becomes common, it should be possible to introduce an abstraction over
various string types and generalize most functions that work with
String
/str
to instead work generically. This RFC does not
propose taking any such steps now -- but it's important that we can
do so later if Rust's standard strings turn out to not be sufficient
and OS strings become commonplace.
To be added in a follow-up PR.
To be added in a follow-up PR.
Now that we've covered the core principles and techniques used throughout IO, we can go on to explore the modules in detail.
Ideally, the io
module will be split into the parts that can live in
libcore
(most of it) and the parts that are added in the std::io
facade. This part of the organization is non-normative, since it
requires changes to today's IoError
(which currently references
String
); if these changes cannot be performed, everything here will
live in std::io
.
The current std::io::util
module offers a number of Reader
and
Writer
"adapters". This RFC refactors the design to more closely
follow std::iter
. Along the way, it generalizes the by_ref
adapter:
trait ReadExt: Read {
// ... eliding the methods already described above
// Postfix version of `(&mut self)`
fn by_ref(&mut self) -> &mut Self { ... }
// Read everything from `self`, then read from `next`
fn chain<R: Read>(self, next: R) -> Chain<Self, R> { ... }
// Adapt `self` to yield only the first `limit` bytes
fn take(self, limit: u64) -> Take<Self> { ... }
// Whenever reading from `self`, push the bytes read to `out`
#[unstable] // uncertain semantics of errors "halfway through the operation"
fn tee<W: Write>(self, out: W) -> Tee<Self, W> { ... }
}
trait WriteExt: Write {
// Postfix version of `(&mut self)`
fn by_ref<'a>(&'a mut self) -> &mut Self { ... }
// Whenever bytes are written to `self`, write them to `other` as well
#[unstable] // uncertain semantics of errors "halfway through the operation"
fn broadcast<W: Write>(self, other: W) -> Broadcast<Self, W> { ... }
}
// An adaptor converting an `Iterator<u8>` to `Read`.
pub struct IterReader<T> { ... }
As with std::iter
, these adapters are object unsafe and hence placed
in an extension trait with a blanket impl
.
The current std::io::util
module also includes a number of primitive
readers and writers, as well as copy
. These are updated as follows:
// A reader that yields no bytes
fn empty() -> Empty; // in theory just returns `impl Read`
impl Read for Empty { ... }
// A reader that yields `byte` repeatedly (generalizes today's ZeroReader)
fn repeat(byte: u8) -> Repeat;
impl Read for Repeat { ... }
// A writer that ignores the bytes written to it (/dev/null)
fn sink() -> Sink;
impl Write for Sink { ... }
// Copies all data from a `Read` to a `Write`, returning the amount of data
// copied.
pub fn copy<R, W>(r: &mut R, w: &mut W) -> Result<u64, Error>
Like write_all
, the copy
method will discard the amount of data already
written on any error and also discard any partially read data on a write
error. This method is intended to be a convenience and write
should be used
directly if this is not desirable.
The seeking infrastructure is largely the same as today's, except that
tell
is removed and the seek
signature is refactored with more precise
types:
pub trait Seek {
// returns the new position after seeking
fn seek(&mut self, pos: SeekFrom) -> Result<u64, Error>;
}
pub enum SeekFrom {
Start(u64),
End(i64),
Current(i64),
}
The old tell
function can be regained via seek(SeekFrom::Current(0))
.
The current Buffer
trait will be renamed to BufRead
for
clarity (and to open the door to BufWrite
at some later
point):
pub trait BufRead: Read {
fn fill_buf(&mut self) -> Result<&[u8], Error>;
fn consume(&mut self, amt: uint);
fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> Result<(), Error> { ... }
fn read_line(&mut self, buf: &mut String) -> Result<(), Error> { ... }
}
pub trait BufReadExt: BufRead {
// Split is an iterator over Result<Vec<u8>, Error>
fn split(&mut self, byte: u8) -> Split<Self> { ... }
// Lines is an iterator over Result<String, Error>
fn lines(&mut self) -> Lines<Self> { ... };
// Chars is an iterator over Result<char, Error>
fn chars(&mut self) -> Chars<Self> { ... }
}
The read_until
and read_line
methods are changed to take explicit,
mutable buffers, for similar reasons to read_to_end
. (Note that
buffer reuse is particularly common for read_line
). These functions
include the delimiters in the strings they produce, both for easy
cross-platform compatibility (in the case of read_line
) and for ease
in copying data without loss (in particular, distinguishing whether
the last line included a final delimiter).
The split
and lines
methods provide iterator-based versions of
read_until
and read_line
, and do not include the delimiter in
their output. This matches conventions elsewhere (like split
on
strings) and is usually what you want when working with iterators.
The BufReader
, BufWriter
and BufStream
types stay
essentially as they are today, except that for streams and writers the
into_inner
method yields the structure back in the case of a flush error:
// If flushing fails, you get the unflushed data back
fn into_inner(self) -> Result<W, IntoInnerError<Self>>;
pub struct IntoInnerError<W>(W, Error);
impl IntoInnerError<T> {
pub fn error(&self) -> &Error { ... }
pub fn into_inner(self) -> W { ... }
}
impl<W> FromError<IntoInnerError<W>> for Error { ... }
Many applications want to view in-memory data as either an implementor of Read
or Write
. This is often useful when composing streams or creating test cases.
This functionality primarily comes from the following implementations:
impl<'a> Read for &'a [u8] { ... }
impl<'a> Write for &'a mut [u8] { ... }
impl Write for Vec<u8> { ... }
While efficient, none of these implementations support seeking (via an
implementation of the Seek
trait). The implementations of Read
and Write
for these types is not quite as efficient when Seek
needs to be used, so the
Seek
-ability will be opted-in to with a new Cursor
structure with the
following API:
pub struct Cursor<T> {
pos: u64,
inner: T,
}
impl<T> Cursor<T> {
pub fn new(inner: T) -> Cursor<T>;
pub fn into_inner(self) -> T;
pub fn get_ref(&self) -> &T;
}
// Error indicating that a negative offset was seeked to.
pub struct NegativeOffset;
impl Seek for Cursor<Vec<u8>> { ... }
impl<'a> Seek for Cursor<&'a [u8]> { ... }
impl<'a> Seek for Cursor<&'a mut [u8]> { ... }
impl Read for Cursor<Vec<u8>> { ... }
impl<'a> Read for Cursor<&'a [u8]> { ... }
impl<'a> Read for Cursor<&'a mut [u8]> { ... }
impl BufRead for Cursor<Vec<u8>> { ... }
impl<'a> BufRead for Cursor<&'a [u8]> { ... }
impl<'a> BufRead for Cursor<&'a mut [u8]> { ... }
impl<'a> Write for Cursor<&'a mut [u8]> { ... }
impl Write for Cursor<Vec<u8>> { ... }
A sample implementation can be found in a gist. Using one
Cursor
structure allows to emphasize that the only ability added is an
implementation of Seek
while still allowing all possible I/O operations for
various types of buffers.
It is not currently proposed to unify these implementations via a trait. For
example a Cursor<Rc<[u8]>>
is a reasonable instance to have, but it will not
have an implementation listed in the standard library to start out. It is
considered a backwards-compatible addition to unify these various impl
blocks
with a trait.
The following types will be removed from the standard library and replaced as follows:
MemReader
->Cursor<Vec<u8>>
MemWriter
->Cursor<Vec<u8>>
BufReader
->Cursor<&[u8]>
orCursor<&mut [u8]>
BufWriter
->Cursor<&mut [u8]>
The std::io
module will largely be a facade over core::io
, but it
will add some functionality that can live only in std
.
The IoError
type will be renamed to std::io::Error
, following our
non-prefixing convention.
It will remain largely as it is today, but its fields will be made
private. It may eventually grow a field to track the underlying OS
error code.
The std::io::IoErrorKind
type will become std::io::ErrorKind
, and
ShortWrite
will be dropped (it is no longer needed with the new
Write
semantics), which should decrease its footprint. The
OtherIoError
variant will become Other
now that enum
s are
namespaced. Other variants may be added over time, such as Interrupted
,
as more errors are classified from the system.
The EndOfFile
variant will be removed in favor of returning Ok(0)
from read
on end of file (or write
on an empty slice for example). This
approach clarifies the meaning of the return value of read
, matches Posix
APIs, and makes it easier to use try!
in the case that a "real" error should
be bubbled out. (The main downside is that higher-level operations that might
use Result<T, IoError>
with some T != usize
may need to wrap IoError
in a
further enum if they wish to forward unexpected EOF.)
The ChanReader
and ChanWriter
adapters will be left as they are today, and
they will remain #[unstable]
. The channel adapters currently suffer from a few
problems today, some of which are inherent to the design:
- Construction is somewhat unergonomic. First a
mpsc
channel pair must be created and then each half of the reader/writer needs to be created. - Each call to
write
involves moving memory onto the heap to be sent, which isn't necessarily efficient. - The design of
std::sync::mpsc
allows for growing more channels in the future, but it's unclear if we'll want to continue to provide a reader/writer adapter for each channel we add tostd::sync
.
These types generally feel as if they're from a different era of Rust (which
they are!) and may take some time to fit into the current standard library. They
can be reconsidered for stabilization after the dust settles from the I/O
redesign as well as the recent std::sync
redesign. At this time, however, this
RFC recommends they remain unstable.
To be added in a follow-up PR.
Most of what's available in std::os
today will move to std::env
,
and the signatures will be updated to follow this RFC's
Design principles as follows.
Arguments:
args
: change to yield an iterator rather than vector if possible; in any case, it should produce anOsString
.
Environment variables:
-
vars
(renamed fromenv
): yields a vector of(OsString, OsString)
pairs. -
var
(renamed fromgetenv
): take a value bounded byAsOsStr
, allowing Rust strings and slices to be ergonomically passed in. Yields anOption<OsString>
. -
var_string
: take a value bounded byAsOsStr
, returningResult<String, VarError>
whereVarError
represents a non-unicodeOsString
or a "not present" value. -
set_var
(renamed fromsetenv
): takes twoAsOsStr
-bounded values. -
remove_var
(renamed fromunsetenv
): takes aAsOsStr
-bounded value. -
join_paths
: take anIntoIterator<T>
whereT: AsOsStr
, yield aResult<OsString, JoinPathsError>
. -
split_paths
take aAsOsStr
, yield anIterator<Path>
.
Working directory:
current_dir
(renamed fromgetcwd
): yields aPathBuf
.set_current_dir
(renamed fromchange_dir
): takes anAsPath
value.
Important locations:
home_dir
(renamed fromhomedir
): returns home directory as aPathBuf
temp_dir
(renamed fromtmpdir
): returns a temporary directly as aPathBuf
current_exe
(renamed fromself_exe_name
): returns the full path to the current binary as aPathBuf
in anio::Result
instead of anOption
.
Exit status:
get_exit_status
andset_exit_status
stay as they are, but with updated docs that reflect that these only affect the return value ofstd::rt::start
. These will remain#[unstable]
for now and a future RFC will determine their stability.
Architecture information:
num_cpus
,page_size
: stay as they are, but remain#[unstable]
. A future RFC will determine their stability and semantics.
Constants:
- Stabilize
ARCH
,DLL_PREFIX
,DLL_EXTENSION
,DLL_SUFFIX
,EXE_EXTENSION
,EXE_SUFFIX
,FAMILY
as they are. - Rename
SYSNAME
toOS
. - Remove
TMPBUF_SZ
.
This brings the constants into line with our naming conventions elsewhere.
pipe
will move toos::unix
. It is currently primarily used for hooking to the IO of a child process, which will now be done behind a trait object abstraction.
errno
,error_string
andlast_os_error
provide redundant, platform-specific functionality and will be removed for now. They may reappear later inos::unix
andos::windows
in a modified form.dll_filename
: deprecated in favor of working directly with the constants._NSGetArgc
,_NSGetArgv
: these should never have been public.self_exe_path
: deprecated in favor ofcurrent_exe
plus path operations.make_absolute
: deprecated in favor of explicitly joining with the working directory.- all
_as_bytes
variants: deprecated in favor of yieldingOsString
values
To be added in a follow-up PR.
To be added in a follow-up PR.
Currently std::io::process
is used only for spawning new
processes. The re-envisioned std::process
will ultimately support
inspecting currently-running processes, although this RFC does not
propose any immediate support for doing so -- it merely future-proofs
the module.
The Command
type is a builder API for processes, and is largely in
good shape, modulo a few tweaks:
- Replace
ToCStr
bounds withAsOsStr
. - Replace
env_set_all
withenv_clear
- Rename
cwd
tocurrent_dir
, takeAsPath
. - Rename
spawn
torun
- Move
uid
andgid
to an extension trait inos::unix
- Make
detached
take abool
(rather than always setting the command to detached mode).
The stdin
, stdout
, stderr
methods will undergo a more
significant change. By default, the corresponding options will be
considered "unset", the interpretation of which depends on how the
process is launched:
- For
run
orstatus
, these will inherit from the current process by default. - For
output
, these will capture to new readers/writers by default.
The StdioContainer
type will be renamed to Stdio
, and will not be
exposed directly as an enum (to enable growth and change over time).
It will provide a Capture
constructor for capturing input or output,
an Inherit
constructor (which just means to use the current IO
object -- it does not take an argument), and a Null
constructor. The
equivalent of today's InheritFd
will be added at a later point.
We propose renaming Process
to Child
so that we can add a
more general notion of non-child Process
later on (every
Child
will be able to give you a Process
).
stdin
,stdout
andstderr
will be retained as public fields, but their types will change to newtyped readers and writers to hide the internal pipe infrastructure.- The
kill
method is dropped, andid
andsignal
will move toos::platform
extension traits. signal_exit
,signal_kill
,wait
, andforget
will all stay as they are.set_timeout
will be changed to use thewith_deadline
infrastructure.
There are also a few other related changes to the module:
- Rename
ProcessOutput
toOutput
- Rename
ProcessExit
toExitStatus
, and hide its representation. Removematches_exit_status
, and add astatus
method yielding anOption<i32>
- Remove
MustDieSignal
,PleaseExitSignal
. - Remove
EnvMap
(which should never have been exposed).
Initially, this module will be empty except for the platform-specific
unix
and windows
modules. It is expected to grow additional, more
specific platform submodules (like linux
, macos
) over time.
To be expanded in a follow-up PR.
The prelude
submodule will contain most of the traits, types, and
modules discussed in this RFC; it is meant to provide maximal
convenience when working with IO of any kind. The exact contents of
the module are left as an open question.
This RFC is largely about cleanup, normalization, and stabilization of our IO libraries -- work that needs to be done, but that also represents nontrivial churn.
However, the actual implementation work involved is estimated to be
reasonably contained, since all of the functionality is already in
place in some form (including os_str
, due to @SimonSapin's
WTF-8 implementation).
The main alternative design would be to continue staying with the
Posix tradition in terms of naming and functionality (for which there
is precedent in some other languages). However, Rust is already
well-known for its strong cross-platform compatibility in std
, and
making the library more Windows-friendly will only increase its appeal.
More radically different designs (in terms of different design principles or visions) are outside the scope of this RFC.
To be expanded in follow-up PRs.
(Text from @SimonSapin)
Rather than WTF-8, OsStr
and OsString
on Windows could use
potentially-ill-formed UTF-16 (a.k.a. "wide" strings), with a
different cost trade off.
Upside:
- No conversion between
OsStr
/OsString
and OS calls.
Downsides:
- More expensive conversions between
OsStr
/OsString
andstr
/String
. - These conversions have inconsistent performance characteristics between platforms. (Need to allocate on Windows, but not on Unix.)
- Some of them return
Cow
, which has some ergonomic hit.
The API (only parts that differ) could look like:
pub mod os_str {
#[cfg(windows)]
mod imp {
type Buf = Vec<u16>;
type Slice = [u16];
...
}
impl OsStr {
pub fn from_str(&str) -> Cow<OsString, OsStr>;
pub fn to_string(&self) -> Option<CowString>;
pub fn to_string_lossy(&self) -> CowString;
}
#[cfg(windows)]
pub mod windows{
trait OsStringExt {
fn from_wide_slice(&[u16]) -> Self;
fn from_wide_vec(Vec<u16>) -> Self;
fn into_wide_vec(self) -> Vec<u16>;
}
trait OsStrExt {
fn from_wide_slice(&[u16]) -> Self;
fn as_wide_slice(&self) -> &[u16];
}
}
}