Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to set growth size #96

Closed
recht opened this issue Sep 22, 2017 · 5 comments
Closed

Make it possible to set growth size #96

recht opened this issue Sep 22, 2017 · 5 comments
Milestone

Comments

@recht
Copy link

recht commented Sep 22, 2017

Right now, there's a fixed growth size of 4k. If you're allocating a lot of small objects in a segment that's too small, it's really slow as it is right now. Sometimes it's really hard to predict the final size up front, so making it possible to grow in bigger chunks would be great.

Same goes for new segments in a MultiSegment.

@zombiezen
Copy link
Contributor

SingleSegment is definitely going to be slower since it involves a copy. The growth algorithm could definitely be smarter, though.

However, I wouldn't expect a MultiSegment to be slow (just potentially involving allocations). Can you help me understand what's the problematic behavior there?

@recht
Copy link
Author

recht commented Sep 22, 2017

As I understand it, both types allocate in 4k chunks. If I'm creating mind of small objects then either a lot of copying will be needed or a lot of small chunks - and at that point the append operation inside Allocate becomes expensive.

@zombiezen zombiezen added this to the 2.17 milestone Sep 22, 2017
@zombiezen
Copy link
Contributor

MultiSegment never grows a segment, it just adds more segments with minimum size 4K. Perhaps it should be creating segments with increasingly larger sizes.

SingleSegment probably should be using Go's built-in append, which has better characteristics around picking new capacities.

It would be good to have a benchmark demonstrating the problem, if you can provide a small reproducible case that demonstrates the issue. As a workaround, you can implement the Arena interface to experiment with different allocation strategies.

@recht
Copy link
Author

recht commented Sep 26, 2017

Given an idl like this:

struct Document {
  data @0 :List(Field);

  struct Field {
    stringValue @0 :Text;
  }

}

And a benchmark like this:

package datastore

import (
	"testing"

	"zombiezen.com/go/capnproto2"
)

func BenchmarkSnapshotMulti(b *testing.B) {
	_, seg, _ := capnp.NewMessage(capnp.MultiSegment(make([][]byte, 0)))
	doc, _ := NewRootDocument(seg)

	d, _ := doc.NewData(int32(b.N))
	for i := 0; i < b.N; i++ {
		field := d.At(i)
		field.SetStringValue("some value some value some value some value")
	}
}

func BenchmarkSnapshotSingle(b *testing.B) {
	var buf []byte
	_, seg, _ := capnp.NewMessage(capnp.SingleSegment(buf))
	doc, _ := NewRootDocument(seg)

	d, _ := doc.NewData(int32(b.N))
	for i := 0; i < b.N; i++ {
		field := d.At(i)
		field.SetStringValue("some value some value some value some value")
	}
}

func BenchmarkSnapshotGrow(b *testing.B) {
	_, seg, _ := capnp.NewMessage(MultiExponentialSegment([][]byte{make([]byte, 0, 4096)}))
	doc, _ := NewRootDocument(seg)

	d, _ := doc.NewData(int32(b.N))
	for i := 0; i < b.N; i++ {
		field := d.At(i)
		field.SetStringValue("some value some value some value some value")
	}
}

You get this:

BenchmarkSnapshotMulti-4    	  100000	     21842 ns/op
BenchmarkSnapshotSingle-4   	   50000	     16597 ns/op
BenchmarkSnapshotGrow-4     	  500000	       332 ns/op
PASS
BenchmarkSnapshotMulti-4    	  100000	     22109 ns/op
BenchmarkSnapshotSingle-4   	   50000	     17975 ns/op
BenchmarkSnapshotGrow-4     	  500000	       391 ns/op
PASS

In other words, both the single segment and the multi segment implementations are pretty slow. Interestingly, the multisegment is actually slower than single segment for some reason.

The Grow case is using a custom Arena implementation like you suggested. Basically it will grow each segment to be double size of the previous segment:

const _maxGrowth = 32 * 1024 * 1024

type multiExponentialSegmentArena [][]byte

// MultiExponentialSegment returns a new arena that allocates new segments when
// they are full.  b can be used to populate the buffer for reading or
// to reserve memory of a specific size.
func MultiExponentialSegment(b [][]byte) capnp.Arena {
	msa := new(multiExponentialSegmentArena)
	*msa = b
	return msa
}

func (msa *multiExponentialSegmentArena) NumSegments() int64 {
	return int64(len(*msa))
}

func (msa *multiExponentialSegmentArena) Data(id capnp.SegmentID) ([]byte, error) {
	if int64(id) >= int64(len(*msa)) {
		return nil, fmt.Errorf("oob")
	}
	return (*msa)[id], nil
}

func (msa *multiExponentialSegmentArena) Allocate(sz capnp.Size, segs map[capnp.SegmentID]*capnp.Segment) (capnp.SegmentID, []byte, error) {
	var last []byte
	for i, data := range *msa {
		id := capnp.SegmentID(i)
		if s := segs[id]; s != nil {
			data = s.Data()
		}
		if sz <= capnp.Size(cap(data)-len(data)) {
			return id, data, nil
		}
		last = data
	}
	if int(sz) < cap(last) {
		if cap(last) >= _maxGrowth {
			sz = _maxGrowth
		} else {
			sz = capnp.Size(cap(last)) * 2
		}
	}
	// padToWord
	n := capnp.Size(7)
	sz = (sz + n) &^ n

	buf := make([]byte, 0, int(sz))
	id := capnp.SegmentID(len(*msa))
	*msa = append(*msa, buf)
	return id, buf, nil
}

@zombiezen
Copy link
Contributor

I rewrote the benchmark slightly (since I want the amount of work to be fixed):

func BenchmarkSnapshotSingle(b *testing.B) {
	const (
		fieldValue = "1234567" // carefully chosen to be word-padded

		rootMessageOverhead = 8 * 3 // root pointer, Document struct, composite list tag
		perFieldOverhead    = 8 * 2 // Field struct, fieldValue + "\0"
		numElements         = 64 * 1024
		totalSize           = rootMessageOverhead + perFieldOverhead*numElements
	)
	b.SetBytes(totalSize)
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		_, seg, _ := capnp.NewMessage(capnp.SingleSegment(nil))
		doc, _ := air.NewRootAllocBenchmark(seg)
		d, _ := doc.NewFields(numElements)
		for j := 0; j < numElements; j++ {
			d.At(j).SetStringValue(fieldValue)
		}
	}
}

// etc

I also added another implementation for single slice that mimics the growslice algorithm. On my machine I get:

BenchmarkSnapshotSingle-4                	      50	  23919596 ns/op	  43.84 MB/s	102773133 B/op	     134 allocs/op
BenchmarkSnapshotMulti-4                 	      10	 155608007 ns/op	   6.74 MB/s	 1639160 B/op	     543 allocs/op
BenchmarkSnapshotGrow-4                  	     100	  15469078 ns/op	  67.79 MB/s	 1593952 B/op	      12 allocs/op
BenchmarkSnapshotSingleSegmentAppend-4   	     100	  10680264 ns/op	  98.18 MB/s	 4346192 B/op	      10 allocs/op
PASS
ok  	zombiezen.com/go/capnproto2	5.597s

I'll clean this up and submit it into master. Thanks for writing up the benchmark!

zombiezen added a commit that referenced this issue Oct 2, 2017
Not making algorithm changes yet to establish a baseline.

Updates #96
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants