Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding channel.advancePastByte ? #8105

Closed
mppf opened this issue Jan 2, 2018 · 6 comments
Closed

adding channel.advancePastByte ? #8105

mppf opened this issue Jan 2, 2018 · 6 comments

Comments

@mppf
Copy link
Member

mppf commented Jan 2, 2018

I've been working with @benharsh on some revcomp benchmarks game I/O speed issues.
I'm proposing to add a method to IO.chpl called channel.advancePastByte that reads
until a particular byte is found and leaves the channel cursor just after that byte.
(And raises an EOF error if the byte is not found).

This function enables succinct expression of the fastest known I/O pattern for revcomp.
It could be used in other contexts as well, for example, we could use it in the implementation
of readln to skip characters until the newline.

See also PR #8103.

/* Reads until ``byte`` is found and then leave the channel offset
    just after it. If that byte is never found, raises an UnexpectedEOFError. */
proc channel.advancePastByte(byte:c_int) throws
@ben-albrecht
Copy link
Member

Are there examples of this functionality in IO libraries of other languages?

@mppf
Copy link
Member Author

mppf commented Jan 5, 2018

@benharsh - to what extent was the buffered reader you were emulating part of the Rust standard library?

C++ has such a thing:
http://www.cplusplus.com/reference/istream/istream/ignore/

@benharsh
Copy link
Member

benharsh commented Jan 5, 2018

Rust has its BufReader as part of its standard IO library:

https://doc.rust-lang.org/std/io/struct.BufReader.html

@mppf
Copy link
Member Author

mppf commented Jan 5, 2018

and https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until in particular is the corresponding function.

@mppf
Copy link
Member Author

mppf commented Feb 1, 2018

@benharsh @ben-albrecht - Question: it seems odd to me that the byte argument should have type c_int. In fact the implementation just truncates it to a byte. That it's a c_int at all is a c-ism I got from memchr. Do you think it should:

  1. Be an int that we always extract the bottom byte from
  2. Be an int that we safeCast
  3. Be a uint(8)

I think I prefer 3 but I'm curious if these other options appeal to somebody else.

@benharsh
Copy link
Member

benharsh commented Feb 1, 2018

I vote for uint(8).

@mppf mppf closed this as completed in #8103 Feb 1, 2018
mppf added a commit that referenced this issue Feb 1, 2018
Add channel.advancePastByte, use it to improve revcomp

PR #8045 added some messy rev-comp studies that improve performance by improving the I/O pattern. What the performance comes down to is two things:
 1. Copying large chunks of the input from the channel buffer in to the array to be used
 2. Using memchr to identify the relevant chunks of the input

I experimented with a version that used regexp format strings to replace the memchr call but that had unsatisfying performance.

This PR adds channel.advancePastByte in order to enable the expression of the fast I/O pattern in revcomp easily. Now the revcomp version does the following:
 * "mark" (indicate to the I/O system not to drop the buffer as we might return)
 * identify the offset of the newline (end of the sequence description)
 * identify the offset of the > (start of the next sequence)
 * "revert" (go back to where we "marked")
 * read the data again in one go with readBytes

I'm seeing a 10% speedup for this version beyond revcomp-buf.chpl, and it is much simpler.

While there, I noticed that qio_channel_advance might not set up the buffer in some cases, so added code to do that.

- [x] full local testing
- [x] docs for advancePastByte
- [x] update catch statement to specify type
- [x] check for 32-bit issues ala #8116

Closes #8105.

Reviewed by @benharsh - thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants