-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for read_all #980
RFC for read_all #980
Conversation
returned. After a read call returns having successfully read some | ||
bytes, the total number of bytes read will be updated. If that | ||
total is equal to the size of the buffer, read will return | ||
successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for introducing a new error kind instead of following the pattern of read()
's return value (io::Result<uint>
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Consistency with write_all.
- io::Result is either Ok(uint) or Error. read_all has only one kind of ok result: "yep, we read the whole thing". If instead we returned Ok(number of bytes read), then every caller would have to check that the number of bytes read matched their expectations. In other words, we are optimizing for the common case of "either this file has the data, or we're doomed". And because the error on EOF is distinct from other errors, callers who care can choose to match on it and handle it separately. But I expect that most callers will not care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, we are optimizing for the common case of "either this file has the data, or we're doomed".
Who says that that's the common case? What about: "Fill this buffer as much as possible without me having to check for EINTR"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one could use take + read_to_end to do something like that (of course, it allocates a new buffer instead of re-using an old one). It seems likely to me that the "eof is a temporary condition rather than an error" applies more to network streams, and re-using buffers in network stream situations is less safe: http://www.tedunangst.com/flak/post/heartbleed-in-rust
I guess my only argument for read_all being the common case is that I've seen it twice (once in the code I am now writing, and once in the link above), compared to zero for the other one.
I could put this into the rationale section, if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, it allocates a new buffer instead of re-using an old one
Exactly.
It seems likely to me that the "eof is a temporary condition rather than an error"
I wasn't talking about EOF. I was talking about EINTR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a four byte difference between i32 errno and u64 usize that we're talking here?
No that's probably manageable, I just wanted to point out that the size of io::Error
is a very real worry and it can be affected by ErrorKind
depending on how we're implementing it.
Making a separate error type for this method might be the most straightforward approach.
This is possible, but has drawbacks:
- More API surface area to stabilize
- Nonstandard as the very similar
write_all
API would probably also want this treatment. - Can hinder understanding by returning a non-normal error type.
- Operating generically over I/O functions may be hindered as this would stick out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think write_all can't use a separate error type. The write_all fn should never return WriteZero on a POSIX system (unless a zero-length buffer is passed); instead, it will either succeed or return some other error. So in fact we would have need to annotate all of the other io errors with the number of bytes written. I don't know how useful this would be, and it seems like an odd wart to have to have on io::Error given that it's only useful for write_all and read_all.
By contrast, read_all could just add the number of bytes written to ShortRead. That seems simpler, and I would be happy to adjust this proposal and the implementing Pull Request if that's the consensus.
I don't understand how operating generically would be hindered -- can you explain a bit? I'm not very familiar with the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton, the second case (Some bytes were read, and then we saw Ok(0)
) is the only case for which I would generally care about knowing how many bytes were read. For the I/O error case, the vast majority of the time I don't care about how many bytes were read, I just bail out and return the error. I'm much more likely to care in the Ok(0)
case, such as if I'm reading fixed-size records from a file or the network, but the last record is allowed to be short.
Since the EOF condition doesn't exist for write_all
, no changes would be needed there for consistency.
One way to implement this would be as follows:
- Add
ShortRead(usize)
toRepr
- Also add
ShortRead(usize)
toErrorKind
- Add
Repr::ShortRead(n) => ErrorKind::ShortRead(n)
toError::kind
(and also updatedescription
anddetail
)
This would not increase the size of io::Error
. (io::Error
already has Custom(Box<Custom>)
, which is the same size). It would increase the size of Custom
, but that always gets boxed. It would also allow the return value to stay io::Result<()>
, as desired, and would maintain usability by keeping the check for a short read out of the normal program flow and allowing a short read to be treated as a normal error value (via try!
, et cetera) unless the user explicitly wants to handle it differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how operating generically would be hindered -- can you explain a bit?
This may not be too too common, but if you are operating in a context that expects io::Result<T>
to be returned via something like:
fn foo<F>(f: F) -> io::Result<T>
where F: Fn() -> io::Result<T> {
// ...
}
You can't naturally use read_all
with a different error type as you would be able to otherwise. You can probably get around it with try!
and Ok
, but it's somewhat unfortunate to have to do so.
Since the EOF condition doesn't exist for write_all, no changes would be needed there for consistency.
Technically Ok(0)
is indeed special as cases like Write
on &mut [u8]
will end up generating this once you hit the end of the buffer.
One way to implement this would be as follows:
Hm yes, that does seem plausible! I would still be a little wary of adding this error, but we do have precedent with WriteZero
, so it may no be so bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton
TechnicallyOk(0)
is indeed special as cases likeWrite
on&mut [u8]
will end up generating this once you hit the end of the buffer.
Really? I would expect and strongly argue that that should generate an IO error, even for a standard write
call, and not return Ok(0)
. In my mind, this is analogous to a pipe or a socket with the remote end closed, where reading yields end of file, while writing results in an error. As perhaps a closer analogy to the fixed buffer case, on Linux, if I have a fixed-size block device (say a small ramdisk), and I read to the end, subsequent reads will return 0 (end of file). If, however, I reach the end of the file and try to write more, I'll receive an error (ENOSPC
). Similarly, writing to a file that has reached the maximum allowed size results in (EFBIG
), not a successful write of zero size.
Is there any chance this can be changed?
👍 to this in its current form. |
|
||
If we wanted io::Error to be a smaller type, ErrorKind::ShortRead | ||
could be unparameterized. But this would reduce the information | ||
available to calleres. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would note that having a parameterized ErrorKind::ShortRead
wouldn't actually increase the size of io::Error
, only that of ErrorKind
.
Thanks for all the comments so far. Is there anything else I should be doing here? Or should I just be patient and let the process work its magic? |
Thanks for the PR! We're a little slow dealing with RFCs at the moment, since the focus is on pushing out the beta release next week. This RFC touches on some issues that were discussed pretty extensively during IO reform: In particular, we used to have some convenience methods like this (see Note that it's quite straightforward to build libraries outside of Note also that there's an inherent asymmetry between reading and writing here, in that it's easy to get a slice of an entire existing data structure ( In short, it'd help to have more discussion about the scenarios in which this convenience is a big win, to the point that it should be included in the standard library (that has just been pared down), keeping in mind that |
That's a very good point. The use cases I'm thinking about include the ones in
So this definition of read_all, to me, seems like the only alternative that is safe but does not impose a tax. That said, I didn't have to write byteorder, but I do have to write my application, so from a purely selfish perspective, a read_all returning a Vec would be simpler. But I don't think it is a good idea.
read_at_least seemed to be a complicated optimization for an uncommon use case, so I can definitely see why it was pulled. It looks like read_exact was pulled for safety reasons. But my proposed read_all is not unsafe, so that is no objection. Here's one scenario where I would use read_all: reading git pack files: https://www.kernel.org/pub/software/scm/git/docs/technical/pack-format.txt
If I'm reading the "20-byte base object name", I want to use read_all because (a) a premature eof is an error and (b) I know exactly how many bytes I want to read (and probably have a buffer for it). Looking at the git source code, the approximately equivalent function ("read_in_full") is called a couple dozen times. A few are for read_to_end scenarios, but most are for cases where users simply don't want to have to deal with EINTR. |
👍 I was just about the write the same RFC. |
I'd much prefer this version to look like |
That is a completely different functionality. What you are proposing is a different (admittedly better) signature for |
No, I'm proposing reading exactly Admittedly, the |
Ahh I see, I was confused by the name. One should better call it |
write_all. | ||
|
||
Or we could leave this out, and let every Rust user write their own | ||
read_all function -- like savages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. Cargo makes us a highly developed distributed culture. With minimal investment needed to contribute & make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. Cargo makes us a highly developed distributed culture.
If you mean we don't need a standard library because we have Cargo then I disagree strongly. Putting things off into crates was a strategy used to reach a stable 1.0 library in time, but the standard library should catch up eventually. If you do a thing, you should do it good. If standard library does reading and writing, then it should cover such basics as filling an entire buffer in face of interrupts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArtemGr Completely agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm disagreeing with saying we're savages if it's not in libstd. And we don't have to write it ourselves -- it's easy to reuse code using cargo, very easy. Not saying that's a replacement but an augmentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not savages, but babies. = )
It's 1.0 beta and there is space to grow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you do this, calling people names is beside the point of the RFC and doesn't help. I'll reply with some on-topic questions in the main discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huuuuuge -1 for 'like savages' here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling people names is beside the point of the RFC and doesn't help
Sorry it didn't help, @bluss. I had to try. = }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ignoring the last line of this RFC. As has already been pointed out there is a discrepancy between reading/writing with the std::io as it currently stands. I comments on here would also reenforce that its already catching a number of us out. Simply deferring to cargo for a single function will also create problems. Not only in the cost of tracking down the correct crate (maybe byteorder in this case?) but also the cost of pull in all of its dependencies etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crate can be designed for that use case though: small and without dependencies. Using cargo is super easy IMO. Since crates may even re-export other crates, you can even ask authors to split out features to sub crates (when it's possible).
There are already two ways that I know of to have Rust perform the read loop to fill a buffer. One using Vec and one to a stack-allocated buffer. We employ composition (Take adaptor) here, which is commonly favoured in Rust. I understand the I/O API will grow, but to formulate the way forward we need to understand what we already have and the problems with it. A complete RFC should certainly address it. To the code: (playpen link) use std::io::{self, Read};
fn main() {
const N: usize = 32;
let data: &[u8] = &[1u8; 1024];
// Use std::io::copy to fill a stack buffer
// Documented to retry and continue on interrupt.
let mut buf = [0u8; N];
let res = io::copy(&mut data.take(N as u64), &mut &mut buf[..]);
println!("{:?}", res);
println!("{:?}", buf);
// Use .read_to_end to read to a vec.
// Documented to retry and continue on interrupt.
let mut vec = Vec::with_capacity(N);
let res = data.take(N as u64).read_to_end(&mut vec);
println!("{:?}", res);
println!("{:?}", vec);
} |
Didn't know we have |
@bluss thanks for following up with an example. But is the big downside of using copy/take over the proposed rfc is the internal allocation http://doc.rust-lang.org/1.0.0-beta.3/src/std/io/util.rs.html#31-44 that would not be needed with https://github.com/rust-lang/rust/pull/23369/files#diff-668f8f358d4a93474b396dcb3727399eR202 |
@bluss I tried take when I was first writing this code, but it seems that File.take works differently than &[u8].take (?). I took your sample and rewrote it to use File: https://gist.github.com/novalis/160eae78e90900af0f14 When I go to compile it, I get:
I probably just don't know rust. |
@novalis You'd have to use .by_ref() with a file. I'm not advocating the specific io::copy API because it's very tricky to call. @nwin Is it the main point? The first step is to perform the retry loop at all, I guess .read_to_end() is the best/only viable way to do that in Rust. To fail if less is read, I think the name @markuskobler Yes, io::copy is clearly deficient. The way its API works is crummy, and it uses a big buffer, so it's not for this use case at all. |
.by_ref().take(N) works for me. Sure, I have to check how many bytes have been read, but I would have to do that somehow anyway. |
@bluss: my main point is the asymmetry between |
@nwin 👍 |
@gsingh93 I'd prefer to merge it here so as to not split the discussion, if @novalis is okay with it. This is the first time I meddle with a Rust RFC, so I'm not sure what the correct procedure would be. @alexcrichton , @aturon , any advice? |
@cesarb You can submit a PR to https://github.com/novalis/rfcs/tree/novalis/read-all, and if it gets merged it should show up here I think. |
Make this RFC be again about a single method
Ok, I merged your PR. |
Explanations about the buffer contents
I added an explanation for why the contents of |
And, just for fun, an example of how this would be used in real life: extern crate byteorder;
use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};
use std::io::{Read, Write, Result};
struct Header {
tag: [u8; 16],
number: u64,
}
impl Header {
pub fn read<R: Read>(reader: &mut R) -> Result<Self> {
let mut tag = [0; 16];
try!(reader.read_exact(&mut tag));
let number = try!(reader.read_u64::<BigEndian>());
Ok(Header { tag: tag, number: number })
}
pub fn write<W: Write>(&self, writer: &mut W) -> Result<()> {
try!(writer.write_all(&self.tag));
try!(writer.write_u64::<BigEndian>(self.number));
Ok(())
}
} |
@alexcrichton Can this be put back into it's FCP? I'm hoping to get this in the 1.3 beta so I don't want it to get too delayed. |
Thanks for the ping @gsingh93, we were actually just in the libs triage meeting and we did indeed decide to put this back in FCP. This won't make the 1.3 window as Anyway, this RFC is now entering its week-long final comment period. |
|
||
``` rust | ||
fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> { | ||
while !buf.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this while
should just be a loop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this
while
should just be aloop
?
If it were a loop
, we would always do one extra call of self.read
with a zero-length buf
before exiting the loop. That's wasteful and could confuse poorly-implemented Read
instances.
(Example: we had 10 bytes left to read, and self.read(buf)
returned Ok(10)
. Now buf
has length zero, and we go again to the top of the loop. Without the test for !buf.is_empty()
, we would call again self.read(buf)
with a zero-length buf
.)
I'm okay, splitting these methods up for easier discussion/approval. However, as I've said before, if we were to only get one, I'd rather have |
The libs team, after another round of discussion, has decided to merge this RFC as written. Thanks for the detailed writeup of the design tradeoffs in particular! |
This implements the proposed "read_exact" RFC (rust-lang/rfcs#980).
This implements the proposed "read_exact" RFC (rust-lang/rfcs#980). Tracking issue: #27585
Code for this: rust-lang/rust#23369
Rendered