Read full #56

brentp · 2016-12-31T21:04:49Z

This gives

name    old time/op    new time/op    delta
Read-4    28.4ms ± 1%    26.6ms ± 1%   -6.04%  (p=0.000 n=15+12)

name    old alloc/op   new alloc/op   delta
Read-4    6.90MB ± 0%    0.21MB ± 0%  -96.99%  (p=0.000 n=15+13)

name    old allocs/op  new allocs/op  delta
Read-4       414 ± 0%       316 ± 0%  -23.67%  (p=0.000 n=15+15)

and with -conc 3:

name    old time/op    new time/op    delta
Read-4    15.5ms ± 1%    14.6ms ± 2%   -5.79%  (p=0.000 n=14+14)

name    old alloc/op   new alloc/op   delta
Read-4    10.8MB ± 0%     4.1MB ± 0%  -61.87%  (p=0.000 n=15+15)

name    old allocs/op  new allocs/op  delta
Read-4       481 ± 0%       383 ± 0%  -20.43%  (p=0.000 n=15+14)

in vcfanno that does streaming intersection of many files, this gives ~6-8% improvement in run-time.

kortschak · 2017-01-01T03:08:13Z

I don't like the use of io.ReadFull here. It is intended to signify that the full slice should be filled. The dropping of io.UnexpectedEOF is a code smell that follows from its use. If you want to do it this way, write a simplified copy of io.ReadAtLeast that just fills the slice and returns n. However, I'd like to know what is causing the iterated read operation to be faster that the io.Copy call. This would require profiling.

brentp · 2017-01-01T03:29:15Z

I can change io.ReadFull to loop over Read() until EOF if you prefer, though I don't see why that's preferable. Would a comment above that line indicating the intent of ReadFull be sufficient?

Re profiling. Here is readFrom before:

(pprof) list readFrom
Total: 331.92MB
ROUTINE ======================== github.com/biogo/hts/bgzf.(*block).readFrom in /home/brentp/go/src/github.com/biogo/hts/bgzf/cache.go
         0   323.96MB (flat, cum) 97.60% of Total
         .          .    127:
         .          .    128:func (b *block) readFrom(r io.ReadCloser) error {
         .          .    129:	o := b.owner
         .          .    130:	b.owner = nil
         .          .    131:	buf := bytes.NewBuffer(b.data[:0])
         .   323.96MB    132:	_, err := io.Copy(buf, r)
         .          .    133:	if err != nil {
         .          .    134:		return err
         .          .    135:	}
         .          .    136:	b.buf = bytes.NewReader(buf.Bytes())
         .          .    137:	b.owner = o

and after, that change:

(pprof) list readFrom
Total: 2.68MB

I'm not sure why io.Copy is allocating so much, I guess it could be re-sizing the slice (not the underlying array) along with the buffer used by io.Copy. Though using io.CopyBuffer with a buffer from a sync.Pool did not improve things as much as this.

brentp · 2017-01-01T03:46:07Z

On a 3GB VCF, here is memory profile before:

(pprof) top10
52.19GB of 52.40GB total (99.59%)
Dropped 14 nodes (cum <= 0.26GB)
      flat  flat%   sum%        cum   cum%
   52.14GB 99.49% 99.49%    52.14GB 99.49%  bytes.makeSlice
    0.05GB 0.098% 99.59%    52.40GB   100%  github.com/biogo/hts/bgzf.(*block).readFrom
         0     0% 99.59%    52.35GB 99.90%  bytes.(*Buffer).ReadFrom
         0     0% 99.59%    52.40GB   100%  github.com/biogo/hts/bgzf.(*decompressor).nextBlockAt.func1
         0     0% 99.59%    52.35GB 99.90%  io.Copy
         0     0% 99.59%    52.35GB 99.90%  io.copyBuffer
         0     0% 99.59%    52.40GB   100%  runtime.goexit

and after:

(pprof) top10
210.52MB of 210.52MB total (  100%)
Showing top 10 nodes out of 19 (cum >= 3MB)
      flat  flat%   sum%        cum   cum%
  193.02MB 91.69% 91.69%   193.02MB 91.69%  compress/flate.(*huffmanDecoder).init
   14.50MB  6.89% 98.57%   207.52MB 98.57%  github.com/biogo/hts/bgzf.(*block).readFrom
       3MB  1.43%   100%        3MB  1.43%  compress/gzip.(*Reader).readHeader
         0     0%   100%   193.02MB 91.69%  compress/flate.(*decompressor).Read
         0     0%   100%   193.02MB 91.69%  compress/flate.(*decompressor).nextBlock
         0     0%   100%   193.02MB 91.69%  compress/flate.(*decompressor).readHuffman
         0     0%   100%   193.02MB 91.69%  compress/gzip.(*Reader).Read
         0     0%   100%        3MB  1.43%  compress/gzip.(*Reader).Reset
         0     0%   100%        3MB  1.43%  github.com/biogo/hts/bgzf.(*Reader).Read
         0     0%   100%        3MB  1.43%  github.com/biogo/hts/bgzf.(*Reader).nextBlock

kortschak · 2017-01-01T22:43:55Z

Can you look at the profiles in 'io.Copy`? Mainly who is calling makeslice so much.

brentp · 2017-01-02T02:26:34Z

bytes.Buffer implements ReadFrom so that is used and that causes the extra allocations. See the graphs below. Also, those and the profiles above are for -alloc_space, when using -alloc_objects I see a 7% decrease in allocations which matches the performance improvement.
Here is the profile of ReadFrom

(pprof) list ReadFrom
Total: 52.27GB
ROUTINE ======================== bytes.(*Buffer).ReadFrom in /home/brentp/go/go1.8beta1/go/src/bytes/buffer.go
         0    52.21GB (flat, cum) 99.88% of Total
         .          .    168:			// not enough space at end
         .          .    169:			newBuf := b.buf
         .          .    170:			if b.off+free < MinRead {
         .          .    171:				// not enough space using beginning of buffer;
         .          .    172:				// double buffer capacity
         .    52.01GB    173:				newBuf = makeSlice(2*cap(b.buf) + MinRead)
         .          .    174:			}
         .          .    175:			copy(newBuf, b.buf[b.off:])
         .          .    176:			b.buf = newBuf[:len(b.buf)-b.off]
         .          .    177:			b.off = 0
         .          .    178:		}
         .   208.02MB    179:		m, e := r.Read(b.buf[len(b.buf):cap(b.buf)])
         .          .    180:		b.buf = b.buf[0 : len(b.buf)+m]
         .          .    181:		n += int64(m)
         .          .    182:		if e == io.EOF {
         .          .    183:			break
         .          .    184:		}

before.pdf
after.pdf

brentp · 2017-01-02T02:30:00Z

This suggest something to try with using anonymous structs, but that seems even uglier than ReadFull.

kortschak · 2017-01-02T02:51:26Z

I would suggest either a read loop or use bytes.Buffer's ReadFrom method. In either case there needs to be a check for expected size. In the case of the read loop there needs to be a non progression check when the buffer slice is full (the reader still has bytes to read) and with ReadFrom there needs to be a check that the bytes.Buffer's slice has not been grown.

brentp · 2017-01-02T03:24:02Z

I added a helper function that does the loop.

kortschak · 2017-01-02T03:52:48Z

bgzf/cache.go

+	}
+	if err == io.EOF {
+		err = nil
+	} else if err == nil && n != MaxBlockSize {


In don't understand this. n can only be 0 <= n && n <= MaxBlockSize. So this branch can never have a nil error according to the loop condition.

What you want to do is

switch { case err == io.EOF: return n, nil case n == MaxBlockSize && err == nil: // This is paranoic, but some readers will return // quickly when passed a zero-length byte slice. var dummy [1]byte _, err = r.Read(dummy[:]) if err == nil { return n, io.ErrShortBuffer } if err = io.EOF { err = nil } } return n, err

Add to the comment, readToEOF may consume more bytes from the reader than returned in buf if the length of the available data is greater than MaxBlockSize, or words to that effect.

brentp · 2017-01-02T04:43:41Z

Done.

This is somehow faster. I guess because it can't inline calls with io.Reader args and there were a lot of those now replaced by the single function. On my 3GB test file with -conc 1:

benchmark           old ns/op        new ns/op        delta
BenchmarkRead-4     124945818850     107597436845     -13.88%

benchmark           old allocs     new allocs     delta
BenchmarkRead-4     5728424        5324950        -7.04%

benchmark           old bytes       new bytes     delta
BenchmarkRead-4     56388514360     232078760     -99.59%

and -conc 3

benchmark           old ns/op       new ns/op       delta
BenchmarkRead-4     71698951874     62600825624     -12.69%

benchmark           old allocs     new allocs     delta
BenchmarkRead-4     6135310        5728673        -6.63%

benchmark           old bytes       new bytes       delta
BenchmarkRead-4     86095419064     29939133992     -65.23%

kortschak · 2017-01-02T05:07:38Z

bgzf/cache.go

+	case err == io.EOF:
+		return n, nil
+	case n == MaxBlockSize && err == nil:
+		// some readers may return quickly when passed a zero-length byte slice.


The complete text/formating here would be preferable - and even if not, with a leading capital and short over two lines. It is paranoic - the io.Reader docs say that the following call to Read should return a 0, nil, but I know some that don't.

kortschak

LGTM after comments addressed.

kortschak · 2017-01-02T21:59:08Z

bgzf/bgzf_test.go

+
+	buf := make([]byte, 16384)
+	b.ResetTimer()
+	//b.ReportAllocs()


Please remove this.

kortschak · 2017-01-02T22:00:01Z

bgzf/bgzf_test.go

+	if *file == "" {
+		b.Skip("no bgzf file specified")
+	}
+	fh, err := os.Open(*file)


f is good enough. I don't think we use the term file handle anywhere.

kortschak · 2017-01-02T22:01:00Z

bgzf/bgzf_test.go

+		}
+		bgz.Close()
+	}
+	b.ReportAllocs()


Please remove this; you can get allocation reporting during a benchmark by -test.benchmem.

kortschak · 2017-01-02T22:02:36Z

bgzf/bgzf_test.go

+
+	for i := 0; i < b.N; i++ {
+		fh.Seek(0, os.SEEK_SET)
+		bgz, err := NewReader(fh, *conc)


s/bgz/bg/g please.

the use of bytes.Buffer was causing extra allocations because it went through ReadFrom. This reads directly into the data slice.

brentp · 2017-01-04T03:02:05Z

addressed and squashed

kortschak · 2017-01-04T03:25:57Z

Thank you.

kortschak reviewed Jan 2, 2017

View reviewed changes

kortschak approved these changes Jan 2, 2017

View reviewed changes

bgzf: reducd allocations in Read by copying directly

eb404f3

the use of bytes.Buffer was causing extra allocations because it went through ReadFrom. This reads directly into the data slice.

brentp force-pushed the read-full branch from 5a2ed80 to eb404f3 Compare January 4, 2017 03:01

kortschak merged commit a1b22bf into biogo:master Jan 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read full #56

Read full #56

brentp commented Dec 31, 2016

kortschak commented Jan 1, 2017

brentp commented Jan 1, 2017

brentp commented Jan 1, 2017

kortschak commented Jan 1, 2017

brentp commented Jan 2, 2017

brentp commented Jan 2, 2017

kortschak commented Jan 2, 2017 via email

brentp commented Jan 2, 2017

kortschak Jan 2, 2017

brentp commented Jan 2, 2017

kortschak Jan 2, 2017

kortschak left a comment

kortschak Jan 2, 2017

kortschak Jan 2, 2017

kortschak Jan 2, 2017

kortschak Jan 2, 2017

brentp commented Jan 4, 2017

kortschak commented Jan 4, 2017

Read full #56

Read full #56

Conversation

brentp commented Dec 31, 2016

kortschak commented Jan 1, 2017

brentp commented Jan 1, 2017

brentp commented Jan 1, 2017

kortschak commented Jan 1, 2017

brentp commented Jan 2, 2017

brentp commented Jan 2, 2017

kortschak commented Jan 2, 2017 via email

brentp commented Jan 2, 2017

kortschak Jan 2, 2017

Choose a reason for hiding this comment

brentp commented Jan 2, 2017

kortschak Jan 2, 2017

Choose a reason for hiding this comment

kortschak left a comment

Choose a reason for hiding this comment

kortschak Jan 2, 2017

Choose a reason for hiding this comment

kortschak Jan 2, 2017

Choose a reason for hiding this comment

kortschak Jan 2, 2017

Choose a reason for hiding this comment

kortschak Jan 2, 2017

Choose a reason for hiding this comment

brentp commented Jan 4, 2017

kortschak commented Jan 4, 2017