-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding channel.advancePastByte ? #8105
Comments
Are there examples of this functionality in IO libraries of other languages? |
@benharsh - to what extent was the buffered reader you were emulating part of the Rust standard library? C++ has such a thing: |
Rust has its |
and https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until in particular is the corresponding function. |
@benharsh @ben-albrecht - Question: it seems odd to me that the byte argument should have type
I think I prefer 3 but I'm curious if these other options appeal to somebody else. |
I vote for |
Add channel.advancePastByte, use it to improve revcomp PR #8045 added some messy rev-comp studies that improve performance by improving the I/O pattern. What the performance comes down to is two things: 1. Copying large chunks of the input from the channel buffer in to the array to be used 2. Using memchr to identify the relevant chunks of the input I experimented with a version that used regexp format strings to replace the memchr call but that had unsatisfying performance. This PR adds channel.advancePastByte in order to enable the expression of the fast I/O pattern in revcomp easily. Now the revcomp version does the following: * "mark" (indicate to the I/O system not to drop the buffer as we might return) * identify the offset of the newline (end of the sequence description) * identify the offset of the > (start of the next sequence) * "revert" (go back to where we "marked") * read the data again in one go with readBytes I'm seeing a 10% speedup for this version beyond revcomp-buf.chpl, and it is much simpler. While there, I noticed that qio_channel_advance might not set up the buffer in some cases, so added code to do that. - [x] full local testing - [x] docs for advancePastByte - [x] update catch statement to specify type - [x] check for 32-bit issues ala #8116 Closes #8105. Reviewed by @benharsh - thanks!
I've been working with @benharsh on some revcomp benchmarks game I/O speed issues.
I'm proposing to add a method to IO.chpl called channel.advancePastByte that reads
until a particular byte is found and leaves the channel cursor just after that byte.
(And raises an EOF error if the byte is not found).
This function enables succinct expression of the fastest known I/O pattern for revcomp.
It could be used in other contexts as well, for example, we could use it in the implementation
of readln to skip characters until the newline.
See also PR #8103.
The text was updated successfully, but these errors were encountered: