Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: peek(io::IO, x) #2638

Closed
kmsquire opened this issue Mar 21, 2013 · 10 comments
Closed

Feature Request: peek(io::IO, x) #2638

kmsquire opened this issue Mar 21, 2013 · 10 comments
Labels
needs decision A decision on this change is needed

Comments

@kmsquire
Copy link
Member

As discussed here: https://groups.google.com/forum/?fromgroups=#!topic/julia-users/kDKL1dS4L0w

Basically, I'd like to be able to peek at the first few characters of an IO stream, without removing the characters from the stream. In particular, it needs to work with streams that are not files and therefore not seekable.

@JeffBezanson
Copy link
Member

All we have so far is peek of a byte and peekchar, both only on files. The next step is to implement whatever kinds of peeking we want for IOBuffer, then other streams will operate out of that.

@kmsquire
Copy link
Member Author

I thought I had seen these when I implemented gzip support, but missed them this time because they aren't exported.

Right now, for IOStream, both mimic the c interface: peek reads a byte but returns an Int32, and returns -1 on failure, and peekchar returns char(-1) on failure. It was trivial to implement this for IOBuffer, although it doesn't address the full request here.

Unfortunately, returning -1 doesn't generalize. The obvious solution is to throw an EOFError when trying to peek beyond the end of the file/buffer, but then the interface is inconsistent with the current (unexported but standard) implementation.

Thoughts?

CC: @vtjnash

@JeffBezanson
Copy link
Member

Ah yes, the fact that they're not exported probably means I wasn't happy with the API. There is also a reason why very few (any?) environments have peek for anything more than a byte. We probably have to throw EOFError to make it work in general.

@kmsquire
Copy link
Member Author

A google search revealed the following:

  1. Python offers to peek at a specified number of bytes for buffered streams, although it may not return the number of requested bytes.

http://docs.python.org/2/library/io.html#buffered-streams

  1. More interestingly, java has a mark()/reset() mechanism that lets you read up to a specified maximum number of bytes from a buffered input stream, then reset the buffer to the mark:

http://docs.oracle.com/javase/6/docs/api/java/io/InputStream.html#mark(int)

I think I like the java version, but not sure about the implications. Thoughts?

@kmsquire
Copy link
Member Author

Actually, either way would be fine. The python version is nice in that it's simple, and closer in spirit to julia.

@vtjnash
Copy link
Member

vtjnash commented Mar 28, 2013

The Java implementation seems much cooler to me, since you don't need to reimplement all of the read methods. In python, everything is just a string buffer, so it's not really a problem to peek N bytes and then parse. In Julia, it seems better to me to be able to do multi-type reads (like Java) without needing to create a separate read-ahead/peek buffer. The Java-like implementation could even give you file-like seek access on a stream, by expressing locations relative to the mark.

@kmsquire
Copy link
Member Author

That's fine--I'm okay with either one. I'll work on it.

@StefanKarpinski
Copy link
Member

The Java mark interface does seem rather nice. I can see having to specify a limit beforehand as being constraining sometimes, however. If we add a unmark method, then you can have mark without a limit as long as the programmer is careful always unmark later. A nice interface might be mark with a do block, i.e.:

mark(io) do
  # can reset io in here
end

In fact, that kind of makes me wonder if it wouldn't make more sense to simply wrap the bare io object with another stream that buffers and allows resetting, although I feel like that might not always work well enough.

@kmsquire
Copy link
Member Author

A wrapper was what I was leaning toward. Perhaps there could be two versions, one with a fixed buffer size, and one which spills to disk after the buffer fills (for non-file streams).

JeffBezanson added a commit that referenced this issue Jun 30, 2014
RFM: mark/reset for IOStream, IOBuffer, & AsyncStream (addresses #2638)
@kmsquire
Copy link
Member Author

kmsquire commented Jul 4, 2014

This can be closed now that #3656 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

4 participants