Buffered reverse-complement studies #8045

benharsh · 2017-12-15T17:18:53Z

These versions are based off of Rust entries 2 and 3, and mimic Rust's BufReader struct. One version reads entire sections at a time, and the other reads line-by-line. Both versions use a buffer to reduce the number of IO operations, and copy bytes into a given 1D array up until a particular character (> or \n) is found.

The rust versions parallelize differently, but these entries are mainly to test an alternate approach to IO.

The line-by-line version is currently performing significantly worse, likely due to slicing overhead. PR #8022 added a test dedicated to measuring creation time for array views.

The 'entire sections at a time' version is competitive with the top C entry.

…n-by-section, though we're still using a small buffer to read the bytes from stdin

benharsh · 2017-12-15T17:19:12Z

@chapel-lang/perf-team : anyone want to review?

mppf

This is OK to go in as a study, but I'm going to think a little bit about better ways to do it.

mppf · 2018-01-02T18:14:55Z

test/studies/shootout/reverse-complement/bharshbarg/revcomp-buf.chpl

+    numLeft = fi.length();
+  }
+
+  pragma "no copy return"


Can you put a comment saying that this returns a view into the buffer starting at low ?

mppf · 2018-01-02T18:15:32Z

test/studies/shootout/reverse-complement/bharshbarg/revcomp-buf.chpl

+      if avail.size > 0 {
+        const idx = _memchr(term, avail);
+        if idx >= 0 {
+          data.push_back(avail[..idx]);


For a while I was confused about this line. Could you add a comment here or at the function level, to indicate that it appends a big block to the array once it's available?

mppf · 2018-01-02T18:18:07Z

test/studies/shootout/reverse-complement/bharshbarg/revcomp-buf.chpl

+proc main(args: [] string) {
+  const stdin = openfd(0);
+  var input = new buf(stdin, readSize);
+  var data : [1..0] uint(8);


It'd help to have some comments here. For one thing, you're trying to make data have enough storage to store all of the bytes in the file.

@benharsh

Add channel.advancePastByte, use it to improve revcomp PR #8045 added some messy rev-comp studies that improve performance by improving the I/O pattern. What the performance comes down to is two things: 1. Copying large chunks of the input from the channel buffer in to the array to be used 2. Using memchr to identify the relevant chunks of the input I experimented with a version that used regexp format strings to replace the memchr call but that had unsatisfying performance. This PR adds channel.advancePastByte in order to enable the expression of the fast I/O pattern in revcomp easily. Now the revcomp version does the following: * "mark" (indicate to the I/O system not to drop the buffer as we might return) * identify the offset of the newline (end of the sequence description) * identify the offset of the > (start of the next sequence) * "revert" (go back to where we "marked") * read the data again in one go with readBytes I'm seeing a 10% speedup for this version beyond revcomp-buf.chpl, and it is much simpler. While there, I noticed that qio_channel_advance might not set up the buffer in some cases, so added code to do that. - [x] full local testing - [x] docs for advancePastByte - [x] update catch statement to specify type - [x] check for 32-bit issues ala #8116 Closes #8105. Reviewed by @benharsh - thanks!

benharsh added 5 commits December 12, 2017 09:08

Add version of revcmp that buffers and skips over lines

d4291b9

Cleaner version

56c9a18

Copy bytes into our destination buffer line-by-line instead of sectio…

418be65

…n-by-section, though we're still using a small buffer to read the bytes from stdin

Improve performance slightly by slicing less

8a4b3ae

Update authors/sources

da402e8

mppf self-assigned this Dec 21, 2017

mppf approved these changes Jan 2, 2018

View reviewed changes

Better comments

e879dbf

benharsh merged commit c441def into chapel-lang:master Jan 2, 2018

mppf mentioned this pull request Jan 2, 2018

Add channel.advancePastByte, use it to improve revcomp #8103

Merged

4 tasks

benharsh deleted the revcomp-buf branch March 16, 2018 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffered reverse-complement studies #8045

Buffered reverse-complement studies #8045

benharsh commented Dec 15, 2017

benharsh commented Dec 15, 2017

mppf left a comment

mppf Jan 2, 2018

mppf Jan 2, 2018

mppf Jan 2, 2018

Buffered reverse-complement studies #8045

Buffered reverse-complement studies #8045

Conversation

benharsh commented Dec 15, 2017

benharsh commented Dec 15, 2017

mppf left a comment

Choose a reason for hiding this comment

mppf Jan 2, 2018

Choose a reason for hiding this comment

mppf Jan 2, 2018

Choose a reason for hiding this comment

mppf Jan 2, 2018

Choose a reason for hiding this comment