Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Reader.read_at_least() #13127

Merged
merged 1 commit into from
May 14, 2014
Merged

Add Reader.read_at_least() #13127

merged 1 commit into from
May 14, 2014

Conversation

lilyball
Copy link
Contributor

Reader.read_at_least() ensures that at least a given number of bytes
have been read. The most common use-case for this is ensuring at least 1
byte has been read. If the reader returns 0 enough times in a row, a new
error kind NoProgress will be returned instead of looping infinitely.

This change is necessary in order to properly support Readers that
repeatedly return 0, either because they're broken, or because they're
attempting to do a non-blocking read on some resource that never becomes
available.

Also add .push() and .push_at_least() methods. push() is like read() but
the results are appended to the passed Vec.

Remove Reader.fill() and Reader.push_exact() as they end up being thin
wrappers around read_at_least() and push_at_least().

[breaking-change]

@@ -320,8 +319,12 @@ pub enum IoErrorKind {
ResourceUnavailable,
IoUnavailable,
InvalidInput,
/// The Reader returned 0 from `read()` too many times.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd clarify this to "0 bytes"

@alexcrichton
Copy link
Member

I'm a little uncomfortable about the litany of methods that are getting added to the Reader trait. I was myself uncomfortable adding the fill method. I'd like to take some time to think about whether we can reduce the surface area here to something more reasonable.

Approaching Reader for the first time, you'll be bombarded with read, read_at_least, fill, etc. I feel like we can do better to solve the constraints in play here without as many methods.

@lilyball
Copy link
Contributor Author

@alexcrichton fill turns out to be trivially expressible using read_at_least(), and in fact I reimplemented it in one line using that. So maybe we can just remove fill(). I think read_at_least() is valuable, though, because it's a single central place to handle 0-length reads correctly. Similarly push_exact() is expressed in terms of push_at_least() and can be removed, and read_exact() also seems rather superfluous as it's already just a combination of slice::with_capacity() and push_exact().

I'm also open to the suggestion of renaming it to read_nonzero() and dropping the len param, but then fill() can't be expressed using it and we would need to keep it around.

@lilyball
Copy link
Contributor Author

@alexcrichton I removed fill() and push_exact(), because they're easy to reproduce with read_at_least()/push_at_least(). I tried removing read_exact() too but the necessary replacement ended up being annoying enough that I think it's useful to keep.

/// Fails if `len` is greater than the length of `buf`.
fn read_at_least(&mut self, buf: &mut [u8], len: uint) -> IoResult<uint> {
assert!(len <= buf.len());
// always read at least once in case len == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to read at least once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it seems exceedingly odd to me to call read_at_least(buf, n) and have it read zero times.

@alexcrichton
Copy link
Member

Hm, I'm going to try to put some thinking down into words, this is all just thinking out loud.

Before this change, we have this list of methods for dealing with a partial amount of bytes on a Reader. I'm omitting read because it's obvious that this will stay forever.

  • fill a &mut [u8]
  • push_exact on &mut ~[u8]
  • read_exact returning ~[u8]

After this change, we have this list of methods

  • read_at_least on &mut [u8]
  • push on &mut ~[u8]
  • push_at_least on &mut ~[u8]
  • read_exact returning ~[u8]

This still seems a little sprawling to me, especially when I look at the types that everything operates on. For example, read_at_least pushes onto &mut [u8], but the very similarly named read_exact returns a ~[u8] (same for fill before).

Some thoughts:

  • Do we want to continue to support the "push some bytes on this vector" pattern?
  • Do we want to continue to have methods that returned an owned vector? (read_to_end I believe is an exception to this)

Depending on the answers, we may be able to pare down the API to

fn read(&mut self, buf: &mut [u8]) -> IoResult<uint>;
fn read_at_least(&mut self, buf: &mut [u8], amt: uint) -> IoResult<uint>;
fn read_exact(&mut self, buf: &mut [u8]) -> IoResult<()>;

I like that this only operates on &mut [u8].


A little bit of a ramble, but I'm thinking that these utility functions on a Reader may need to be reconsidered. I think I'm more than comfortable removing read_exact returning ~[u8], but I'm less sure about removing methods that operate on &mut ~[u8].

Thoughts on this? @kballard, @brson

@lilyball
Copy link
Contributor Author

@alexcrichton I would like to remove read_exact(). I left it because the following code:

let bytes = try!(file.read_exact(names_bytes as uint - 1));

became the following:

let bytes = {
    let len = names_bytes as uint - 1;
    let mut bytes = slice::with_capacity(len);
    try!(file.push_at_least(&mut bytes, len, len));
    bytes
};

and that felt just awkward enough that I decided not to remove it at this time. But I only count 5 calls to read_exact() in our source, so maybe we should go ahead and remove it.

As for push()/push_at_least(), I would like to be able to remove them, but I think they're valuable. The alternative implementation is to create a vector with the desired length, pre-filled with zeroes, read into it, and then truncate back to the actual read amount. That's awkward, and slightly less performant due to the need to memset the vector (push() uses unsafe and uninitialized memory, which can be valuable when e.g. read_to_end() is calling it with 64k as the length).

I could see removing push(), which I added mostly to balance out read(), but could be folded into push_at_least(). In fact, I think I'll go ahead and remove that, because anyone who wants it can use push_at_least(buf, 0, len).

@lilyball
Copy link
Contributor Author

Also, looking at your suggested API again, fn read_exact(&mut self, buf: &mut [u8]) -> IoResult<()>; can be removed entirely as that's just read_at_least(buf, buf.len()).map(|_| ()).

@lilyball
Copy link
Contributor Author

Looking at the API again, I think that read_to_end() and read_to_str() are very useful to keep around. Similarly, all of the helpers such as read_byte(), read_le_uint_n(), etc are useful. And I suspect bytes() may be useful too, but offhand I don't know how much it's actually being used.

I'm wondering if there's any utility to splitting Reader into two separate traits, with Reader containing read() and read_at_least(), and ReaderHelpers containing everything else (implemented in terms of Reader). I know that default trait methods have made us move away from this style of traits, but it would serve as a conceptual simplification, between the core Reader methods and all of the helpers that are implemented in terms of it. Although one downside is this would prevent any individual Reader implementation from overriding the implementation of any of the helper methods (e.g. a string-based reader could theoretically override read_to_str() to skip the utf-8 checks).

In any case, I'm open to removing read_exact(), but I won't do so unless one of you think it's a good idea. I will remove push(), but push_at_least() is also useful, primarily because it wraps the unsafety necessary to make reading into a large newly-allocated buffer more efficient. I think everything else can stay.

@DaGenix
Copy link

DaGenix commented Mar 27, 2014

What about doing something like how the gen() method is implemented on Rng? Then, there could be methods like Reader.next() to read a fixed-sized object like a uint and Reader.next_bytes(uint) to read a non-fixed sized object like a Vec.

@lilyball
Copy link
Contributor Author

@DaGenix Reader.next() doesn't make much sense because the only reasonable fixed-sized object to read is a u8 and you don't want to read single bytes at a time if you can help it from most readers.

As for doing something like Rand where each type knows how to generate itself from a Rng, the problem is that the only types that we're trying to read are either buffers, which require a length and don't benefit from a Rand style interface, and various encodings of integers, which cannot use a Rand-like interface unless you define an enum to represent the desired encoding and pass that as an argument, and I think that's a more confusing API than what we have today.

@DaGenix
Copy link

DaGenix commented Mar 29, 2014

I was thinking that .next() could read whatever type is appropriate depending upon the value being assigned to - byte, uint, float, etc. However, I failed to consider big-ending vs. little-endian issues. So, yeah, nevemind this idea.

@lilyball
Copy link
Contributor Author

lilyball commented Apr 1, 2014

@alexcrichton, @brson: Any more thoughts on this PR?

@lilyball
Copy link
Contributor Author

lilyball commented Apr 9, 2014

I just rebased on top of the recent ~[T]->Vec<T> work that went into std::io.

@alexcrichton
Copy link
Member

I'm sorry it took awhile to get back on this, but can you elaborate on what the use case for this is? I can't really think of a case where I want to read N bytes, but if I read some number greater than N I can easily deal with it.

@pongad
Copy link
Contributor

pongad commented Apr 12, 2014

@alexcrichton I think the rationale for this is to provide a convenience method that deals with zero-length reads. I'm not sure if @kballard has other use cases in mind. Though, now that I think about it, for solving zero-length reads, maybe read_any (same as read_at_least(1)) might be more appropriate.
IMO, it is difficult to think about what read_at_least should do if we want to read 2 bytes, but there's only 1 byte left in the Reader. We probably want to return an error since we can't read as many bytes as we want, but then we need to include how many bytes we read in the error, etc. read_any would sidestep this problem since the user won't be able to tell read_any how many bytes to read. It will either just return the number of bytes read (guaranteed to be greater than 0) or error if it cannot read anything or encountered some other error along the way. Go doesn't have a problem with this since ReadAtLeast returns both the bytes read and the error at the same time.

@lilyball
Copy link
Contributor Author

The short answer is we need a read_at_least_one() and having read_at_least() instead allows us to remove fill() (as that's just a special case of read_at_least()). Similarly push_at_least() removes the need for push_exact(). This avoids ballooning the API.

@lilyball
Copy link
Contributor Author

Rebased on top of master. I still think this is something we should do.

@pongad: I don't think that not returning the number of bytes read is a problem. It's hard to think of a use case where those bytes are actually useful. The only real case I can think of where I'd want to be able to use those bytes is if I'm trying to report an error that includes the truncated bytes, and that seems niche enough that it's not worth complicating the API.

@lilyball
Copy link
Contributor Author

lilyball commented May 6, 2014

@alexcrichton et al, any more feedback? I need to rebase it at this point, but I'd like to know that we can move forward with this.

data: self.as_ptr().offset(start as int),
len: (end - start)
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid extending Vec's api further?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you prefer the ugly code that slice_capacity() is replacing? To be clear, that's calling set_len() to pretend uninitialized memory is valid, slicing that, then using try_finally() to call set_len() back to the correct value when done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I prefer to not expand Vec's api on a whim. If you want to add this as a private helper in std::io, that's fine.

@alexcrichton
Copy link
Member

I'm ok with merging this pending comments. It will need a re-worded commit message to reflect the breaking change as well.

@lilyball
Copy link
Contributor Author

lilyball commented May 6, 2014

@alexcrichton Thanks. I need to rebase it, and I'll reword the commit message. Based on my reply to your comments, do you still believe I need to change the semantics of read_at_least()/push_at_least() to avoid the read when min == 0, or are you ok with leaving it as-is?

@alexcrichton
Copy link
Member

It looks like this is kinda based off what go is doing, and they don't do the read when min == 0. I'm not too worried about doing the read per se, but I'm a little worried to be documenting explicitly that a read is always performed.

Additionally, perhaps an error should be returned rather than failing in these new methods?

@lilyball
Copy link
Contributor Author

lilyball commented May 6, 2014

Hmm, I didn't check to see what Go does. I'm actually vaguely surprised to see that ReadAtLeast(r, buf, 0) returns (0, err). But that is indeed what it appears to do.

I documented that the read is always performed because I wanted clients to be able to rely on it. If you don't want it documented, then I think it's better to not do the read when min == 0. The reason is I don't want clients observing that it always reads when min == 0, relying on that assumption, then breaking later if behavior changes (and if behavior never changes, then why omit it from the documentation?)

Given this precedent, I'll go ahead and change the behavior to skip the read.

perhaps an error should be returned rather than failing in these new methods?

The failure happens in response to a logic error, i.e. the client passing a min that's too high. Handling this as an error instead requires adding yet another IoErrorKind to indicate the buffer was too short. This is an error that seems like it will only apply to read_at_least() and to no other function. Do you think it's better to have this once-off error value than it is to simply fail on logic error?

@alexcrichton
Copy link
Member

Our current spirit is to not fail as much as possible, and this seems like an easy case to not fail (you're already returning an error), and I figured that the InvalidInput error kind would suffice in this case. You can fill out the detail with some extra information about what just happened.

@lilyball
Copy link
Contributor Author

lilyball commented May 6, 2014

I didn't realize that's what InvalidInput meant. I suppose it is usable for this. I'll make that change.

@lilyball
Copy link
Contributor Author

lilyball commented May 7, 2014

r? @alexcrichton I've made the requested changes, and also squashed it down to one commit.

Reader.read_at_least() ensures that at least a given number of bytes
have been read. The most common use-case for this is ensuring at least 1
byte has been read. If the reader returns 0 enough times in a row, a new
error kind NoProgress will be returned instead of looping infinitely.

This change is necessary in order to properly support Readers that
repeatedly return 0, either because they're broken, or because they're
attempting to do a non-blocking read on some resource that never becomes
available.

Also add .push() and .push_at_least() methods. push() is like read() but
the results are appended to the passed Vec.

Remove Reader.fill() and Reader.push_exact() as they end up being thin
wrappers around read_at_least() and push_at_least().

[breaking-change]
bors added a commit that referenced this pull request May 14, 2014
Reader.read_at_least() ensures that at least a given number of bytes
have been read. The most common use-case for this is ensuring at least 1
byte has been read. If the reader returns 0 enough times in a row, a new
error kind NoProgress will be returned instead of looping infinitely.

This change is necessary in order to properly support Readers that
repeatedly return 0, either because they're broken, or because they're
attempting to do a non-blocking read on some resource that never becomes
available.

Also add .push() and .push_at_least() methods. push() is like read() but
the results are appended to the passed Vec.

Remove Reader.fill() and Reader.push_exact() as they end up being thin
wrappers around read_at_least() and push_at_least().

[breaking-change]
@bors bors merged commit 972f2e5 into rust-lang:master May 14, 2014
@lilyball lilyball deleted the read_at_least branch May 14, 2014 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants