Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go speed testing #165

Merged
merged 11 commits into from
May 12, 2015
Merged

Go speed testing #165

merged 11 commits into from
May 12, 2015

Conversation

rw
Copy link
Collaborator

@rw rw commented Apr 2, 2015

No description provided.

bmharper and others added 7 commits April 1, 2015 16:47
Identifies alloc-heavy codepaths.
Change the signature for 'string' getters and settings to use byte
slices instead of strings.
Builder has a new CreateByteString function that writes a
null-terimnated byte slice to the buffer. This results in zero
allocations for writing strings.
Add the function `Reset` to the Builder, which facilitates reuse of the
underlying byte slice.
@cbandy cbandy mentioned this pull request Apr 5, 2015
@ghost
Copy link

ghost commented Apr 8, 2015

So, do you want to change this so it generates accessors for both strings and byte vectors? Or do you think Go programmers will be happy with just the the byte vectors?

Also, if you can squash related commits, that's helpful in code review.

@rw rw mentioned this pull request Apr 8, 2015
@rw
Copy link
Collaborator Author

rw commented Apr 19, 2015

The problem with generating multiple accessors is that we can't guarantee they won't collide with other schema-defined fields. For example, a field called 'Name' would get 'Name' and 'NameBytes' as accessors. If there is a 'NameBytes' field (say, a [uchar]), then we'd get a collision.

Personally, I'd rather let the user make the choice whether to allocate a string, at the cost of some syntax noise. That means returning byte slices in all cases.

Anyone else want to share an opinion?

@ghost
Copy link

ghost commented Apr 20, 2015

I'm fine either way, to me it really depends on how people typically use Go.

As far as collisions go, we already have this problem with "Type" and "Length" suffixes (depending on language), which is currently handled in an ad-hoc way in the parser by giving an error for any fields defined that would clash (see CheckClash() in idl_parser.cpp. We could add "Bytes" to that, though maybe "ByteVector" is even better, since it is less likely to clash.

@rw
Copy link
Collaborator Author

rw commented May 9, 2015

  • Removed remaining allocs during building, as reporting during the gold data benchmarks.
  • Added collision check in CheckClash for _byte_vector and ByteVector for the BASE_TYPE_STRING type.

@rw
Copy link
Collaborator Author

rw commented May 9, 2015

Reduced number of backing buffer growth events, bringing BuildGold speed to within a factor of 2.5x ParseGold.

@rw
Copy link
Collaborator Author

rw commented May 10, 2015

I'm hearing that users want to be able to provide their own backing buffer, to have more control over memory usage, and potentially reduce allocs. Something like

NewBuilderFromSlice(...)

or

bldr:= &Builder{Bytes: myslice}
bldr.Reset() // ensures correct initial state of the byte slice

is what is being asked for.

This approach isn't as simple as it seems, though, because a Go Builder actually has three GC'ed objects: the Bytes slice that holds the flatbuf data, the vtables slice that holds already-seen vtables, and vtable which holds the current (if applicable) vtable.

(Note that to set any of these values directly would require making them publicly visible. I'd rather not make vtables and vtable publicly visible, since they are just for the internal state of the Builder.)

So, if users want this, we have a few paths.

  1. Change the Builder constructor to take a Config-like object, then look inside of it to pull out any pre-existing buffers the user gave us. Each field would be optional. You'd use it like this:
myCfg := BuilderConfig{
        InitialBytesBuffer: mySlice1,
        InitialVTableBuffer: mySlice2,
        InitialVTablesBuffer: mySlice3,
}
bldr := NewBuilder(&myCfg)

(Simple use cases will just pass nil for the config.)

  1. Make each alloc'ed field publicly visible, then call Reset on an already-existing Builder, so that users could pass in anything they want. It would look like this:
bldr := &Builder{Bytes: mySlice, VTables: mySlice2}
bldr.Reset() // Sets up bookkeeping

I'm not a fan of this because, like I said, we'd have to make the internal state fields public so that users could set them.

Feedback?

@dgnorton
Copy link

@rw how hard would it be for the builder to take just one buffer and segment it as needed internally so that implementation details are hidden from client code? If any one of the segments (bytes, vtable, vtables, etc.) needed to grow, allocate a whole new buffer, copy each of the segments into the new buffer, and continue.

@rw
Copy link
Collaborator Author

rw commented May 10, 2015

@dgnorton That's an interesting idea. As a caller, that's simpler (but less flexible) than using a Config. One issue I see is that it makes internal bookkeeping code more complicated. Another issue is that the Bytes slice (the largest of these three objects) will lose its exact power-of-2 growth behavior.

@gwvo Any thoughts?

@ghost
Copy link

ghost commented May 11, 2015

In C++ we provide an allocator call-back, is there something similar that can be done in Go?

@rw
Copy link
Collaborator Author

rw commented May 11, 2015

An allocator callback in Go would be something like this:

bldr := NewBuilder(myAllocator)

func myAllocator(existingSlice []byte, desiredCapacity int) []byte {
  // Look in a sync.Pool or somewhere else for a suitable slice.

  // Or make a new one on the heap:
  if existingSlice == nil {
    return make([]byte, desiredCapacity)
  }

  existingSlice = existingSlice[:cap(existingSlice)]
  extension := make([]byte, desiredCapacity - cap(existingSlice))
  extended := append(existingSlice, extension...)
  return extended
}

(This was edited.)

This has the added advantage that we could use it for growing the byte buffer, too.

@gwvo Is this the kind of thing you mean?

@dgnorton Would you use this?

@rw
Copy link
Collaborator Author

rw commented May 11, 2015

(Keep in mind that at this point, you could just have a pool of Builder objects and reuse them.)

@ghost
Copy link

ghost commented May 12, 2015

Yes.. though reusing FlatBufferBuilder objects seems preferable in most situations.

@dgnorton
Copy link

@rw I'm not sure a pool of builders would work. E.g., in our case...

  • get buf from a pool of []byte
  • read flat data from socket into buf
  • get builder from a pool of flatbuffer builders
  • traverse buf normalizing data into new flatbuffer using the builder
  • pass newly constructed buf down the data pipe and return builder to pool

We have a problem at that last step because the builder still owns the memory that we passed down the pipe. We could pass the builder down the pipe but that seems awkward.

Maybe I'm thinking about it wrong and there's a better way?

@rw
Copy link
Collaborator Author

rw commented May 12, 2015

I'd like to handle a public interface for allocations in a separate feature branch. Let's get these speed improvements shipped.

A workaround (for those who want it) is to manually manage the publicly-accessible Bytes buffer, combined with judicious use of the new Reset function.

rw added a commit that referenced this pull request May 12, 2015
@rw rw merged commit 4d213c2 into google:master May 12, 2015
kakikubo pushed a commit to kakikubo/flatbuffers that referenced this pull request Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants