Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read full #56

Merged
merged 1 commit into from
Jan 4, 2017
Merged

Read full #56

merged 1 commit into from
Jan 4, 2017

Conversation

brentp
Copy link
Contributor

@brentp brentp commented Dec 31, 2016

This gives

name    old time/op    new time/op    delta
Read-4    28.4ms ± 1%    26.6ms ± 1%   -6.04%  (p=0.000 n=15+12)

name    old alloc/op   new alloc/op   delta
Read-4    6.90MB ± 0%    0.21MB ± 0%  -96.99%  (p=0.000 n=15+13)

name    old allocs/op  new allocs/op  delta
Read-4       414 ± 0%       316 ± 0%  -23.67%  (p=0.000 n=15+15)

and with -conc 3:

name    old time/op    new time/op    delta
Read-4    15.5ms ± 1%    14.6ms ± 2%   -5.79%  (p=0.000 n=14+14)

name    old alloc/op   new alloc/op   delta
Read-4    10.8MB ± 0%     4.1MB ± 0%  -61.87%  (p=0.000 n=15+15)

name    old allocs/op  new allocs/op  delta
Read-4       481 ± 0%       383 ± 0%  -20.43%  (p=0.000 n=15+14)

in vcfanno that does streaming intersection of many files, this gives ~6-8% improvement in run-time.

@kortschak
Copy link
Member

I don't like the use of io.ReadFull here. It is intended to signify that the full slice should be filled. The dropping of io.UnexpectedEOF is a code smell that follows from its use. If you want to do it this way, write a simplified copy of io.ReadAtLeast that just fills the slice and returns n. However, I'd like to know what is causing the iterated read operation to be faster that the io.Copy call. This would require profiling.

@brentp
Copy link
Contributor Author

brentp commented Jan 1, 2017

I can change io.ReadFull to loop over Read() until EOF if you prefer, though I don't see why that's preferable. Would a comment above that line indicating the intent of ReadFull be sufficient?

Re profiling. Here is readFrom before:

(pprof) list readFrom
Total: 331.92MB
ROUTINE ======================== github.com/biogo/hts/bgzf.(*block).readFrom in /home/brentp/go/src/github.com/biogo/hts/bgzf/cache.go
         0   323.96MB (flat, cum) 97.60% of Total
         .          .    127:
         .          .    128:func (b *block) readFrom(r io.ReadCloser) error {
         .          .    129:	o := b.owner
         .          .    130:	b.owner = nil
         .          .    131:	buf := bytes.NewBuffer(b.data[:0])
         .   323.96MB    132:	_, err := io.Copy(buf, r)
         .          .    133:	if err != nil {
         .          .    134:		return err
         .          .    135:	}
         .          .    136:	b.buf = bytes.NewReader(buf.Bytes())
         .          .    137:	b.owner = o

and after, that change:

(pprof) list readFrom
Total: 2.68MB

I'm not sure why io.Copy is allocating so much, I guess it could be re-sizing the slice (not the underlying array) along with the buffer used by io.Copy. Though using io.CopyBuffer with a buffer from a sync.Pool did not improve things as much as this.

@brentp
Copy link
Contributor Author

brentp commented Jan 1, 2017

On a 3GB VCF, here is memory profile before:

(pprof) top10
52.19GB of 52.40GB total (99.59%)
Dropped 14 nodes (cum <= 0.26GB)
      flat  flat%   sum%        cum   cum%
   52.14GB 99.49% 99.49%    52.14GB 99.49%  bytes.makeSlice
    0.05GB 0.098% 99.59%    52.40GB   100%  github.com/biogo/hts/bgzf.(*block).readFrom
         0     0% 99.59%    52.35GB 99.90%  bytes.(*Buffer).ReadFrom
         0     0% 99.59%    52.40GB   100%  github.com/biogo/hts/bgzf.(*decompressor).nextBlockAt.func1
         0     0% 99.59%    52.35GB 99.90%  io.Copy
         0     0% 99.59%    52.35GB 99.90%  io.copyBuffer
         0     0% 99.59%    52.40GB   100%  runtime.goexit

and after:

(pprof) top10
210.52MB of 210.52MB total (  100%)
Showing top 10 nodes out of 19 (cum >= 3MB)
      flat  flat%   sum%        cum   cum%
  193.02MB 91.69% 91.69%   193.02MB 91.69%  compress/flate.(*huffmanDecoder).init
   14.50MB  6.89% 98.57%   207.52MB 98.57%  github.com/biogo/hts/bgzf.(*block).readFrom
       3MB  1.43%   100%        3MB  1.43%  compress/gzip.(*Reader).readHeader
         0     0%   100%   193.02MB 91.69%  compress/flate.(*decompressor).Read
         0     0%   100%   193.02MB 91.69%  compress/flate.(*decompressor).nextBlock
         0     0%   100%   193.02MB 91.69%  compress/flate.(*decompressor).readHuffman
         0     0%   100%   193.02MB 91.69%  compress/gzip.(*Reader).Read
         0     0%   100%        3MB  1.43%  compress/gzip.(*Reader).Reset
         0     0%   100%        3MB  1.43%  github.com/biogo/hts/bgzf.(*Reader).Read
         0     0%   100%        3MB  1.43%  github.com/biogo/hts/bgzf.(*Reader).nextBlock

@kortschak
Copy link
Member

Can you look at the profiles in 'io.Copy`? Mainly who is calling makeslice so much.

@brentp
Copy link
Contributor Author

brentp commented Jan 2, 2017

bytes.Buffer implements ReadFrom so that is used and that causes the extra allocations. See the graphs below. Also, those and the profiles above are for -alloc_space, when using -alloc_objects I see a 7% decrease in allocations which matches the performance improvement.
Here is the profile of ReadFrom

(pprof) list ReadFrom
Total: 52.27GB
ROUTINE ======================== bytes.(*Buffer).ReadFrom in /home/brentp/go/go1.8beta1/go/src/bytes/buffer.go
         0    52.21GB (flat, cum) 99.88% of Total
         .          .    168:			// not enough space at end
         .          .    169:			newBuf := b.buf
         .          .    170:			if b.off+free < MinRead {
         .          .    171:				// not enough space using beginning of buffer;
         .          .    172:				// double buffer capacity
         .    52.01GB    173:				newBuf = makeSlice(2*cap(b.buf) + MinRead)
         .          .    174:			}
         .          .    175:			copy(newBuf, b.buf[b.off:])
         .          .    176:			b.buf = newBuf[:len(b.buf)-b.off]
         .          .    177:			b.off = 0
         .          .    178:		}
         .   208.02MB    179:		m, e := r.Read(b.buf[len(b.buf):cap(b.buf)])
         .          .    180:		b.buf = b.buf[0 : len(b.buf)+m]
         .          .    181:		n += int64(m)
         .          .    182:		if e == io.EOF {
         .          .    183:			break
         .          .    184:		}

before.pdf
after.pdf

@brentp
Copy link
Contributor Author

brentp commented Jan 2, 2017

This suggest something to try with using anonymous structs, but that seems even uglier than ReadFull.

@kortschak
Copy link
Member

kortschak commented Jan 2, 2017 via email

@brentp
Copy link
Contributor Author

brentp commented Jan 2, 2017

I added a helper function that does the loop.

}
if err == io.EOF {
err = nil
} else if err == nil && n != MaxBlockSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In don't understand this. n can only be 0 <= n && n <= MaxBlockSize. So this branch can never have a nil error according to the loop condition.

What you want to do is

switch {
case err == io.EOF:
    return n, nil
case n == MaxBlockSize && err == nil:
    // This is paranoic, but some readers will return
    // quickly when passed a zero-length byte slice.
    var dummy [1]byte
    _, err = r.Read(dummy[:])
    if err == nil {
        return n, io.ErrShortBuffer
    }
    if err = io.EOF {
        err = nil
    }
}
return n, err

Add to the comment, readToEOF may consume more bytes from the reader than returned in buf if the length of the available data is greater than MaxBlockSize, or words to that effect.

@brentp
Copy link
Contributor Author

brentp commented Jan 2, 2017

Done.

This is somehow faster. I guess because it can't inline calls with io.Reader args and there were a lot of those now replaced by the single function. On my 3GB test file with -conc 1:

benchmark           old ns/op        new ns/op        delta
BenchmarkRead-4     124945818850     107597436845     -13.88%

benchmark           old allocs     new allocs     delta
BenchmarkRead-4     5728424        5324950        -7.04%

benchmark           old bytes       new bytes     delta
BenchmarkRead-4     56388514360     232078760     -99.59%

and -conc 3

benchmark           old ns/op       new ns/op       delta
BenchmarkRead-4     71698951874     62600825624     -12.69%

benchmark           old allocs     new allocs     delta
BenchmarkRead-4     6135310        5728673        -6.63%

benchmark           old bytes       new bytes       delta
BenchmarkRead-4     86095419064     29939133992     -65.23%

case err == io.EOF:
return n, nil
case n == MaxBlockSize && err == nil:
// some readers may return quickly when passed a zero-length byte slice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complete text/formating here would be preferable - and even if not, with a leading capital and short over two lines. It is paranoic - the io.Reader docs say that the following call to Read should return a 0, nil, but I know some that don't.

Copy link
Member

@kortschak kortschak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after comments addressed.


buf := make([]byte, 16384)
b.ResetTimer()
//b.ReportAllocs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.

if *file == "" {
b.Skip("no bgzf file specified")
}
fh, err := os.Open(*file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f is good enough. I don't think we use the term file handle anywhere.

}
bgz.Close()
}
b.ReportAllocs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this; you can get allocation reporting during a benchmark by -test.benchmem.


for i := 0; i < b.N; i++ {
fh.Seek(0, os.SEEK_SET)
bgz, err := NewReader(fh, *conc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bgz/bg/g please.

the use of bytes.Buffer was causing extra allocations because it went
through ReadFrom. This reads directly into the data slice.
@brentp
Copy link
Contributor Author

brentp commented Jan 4, 2017

addressed and squashed

@kortschak kortschak merged commit a1b22bf into biogo:master Jan 4, 2017
@kortschak
Copy link
Member

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants