-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider formats which support random value access #76
Comments
Here are two use cases people might have:
1. Read all the data in a serialized value.
2. Read only some of the data in a serialized value.
I think #1 is the more common use case and that's what this benchmark
attempts to measure.
A long time ago, the benchmark also tried to measure #2. The timeline, as
I remember it:
- There's a library that supports the Protobuf wire format and only lazily
parses the message as fields are accessed.
- This library was added to the benchmark and had a hilariously low
"deserialization time" measurement.
- To improve the situation, we started keeping track of two measurements:
- "full deserialization time" -- read every value (which forces the
lazy deserializer to do all the work)
- "partial deserialization time" -- only read a few fields
This was around for a while (maybe a year?) but there was some discussion
on the mailing list and we decided to drop the "partial" measurement. The
thinking that the benefits weren't worth the extra maintenance burden and
extra real estate on the results page for a use case that was somewhat
specialized.
…On Wed, Jun 14, 2017 at 1:31 PM, Maxim Zaks ***@***.***> wrote:
Hi there,
first of all thank you for the great benchmark. It is a very nice resource.
I wanted to point out that format which support random value access like
Cap'nProto and FlatBuffers are measured a bit unfair. The deserialise
methods of the test make a copy of every value encoded in the buffer even
though it is not necessary. The buffers are serialised in a way that make
them accessible on demand, without the need of actual unmarshaling.
In this gist you can find a rewritten test for FlatBuffers:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531
Which passes and returns following result for decoding:
average:5.92301542549922ms deviation:0.6135154254992203ms
Formats which invest in supporting random value access do it at cost of
size and encoding speed.
This is why I find it unfair to show the shortcomings of encoding and go
around the benefits of the decoding.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#76>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOpdfBWd1bpJa59dBFbmVI-QwDcwOfPks5sEBkBgaJpZM4N6NAf>
.
|
I think that it would be quite difficult to fairly measure even just these 2 variations across formats and libraries. So I think that while test for "partial binding" (or whatever it'd be called) can be valuable, it may make most sense as separate project, focused on such usage. |
Well I guess this is the common misconception which people have towards random value access feature. It is not a lazy parsing feature, where the data is transformed only on demand. So what I find a bit unfair in the current implementation of the test, is the fact that The only use case where using So I don't plead for making a test for partial access of data, I just want to avoid intermediate Java objects. |
Oh, I think I understand your concern now.
The way the benchmark is set up, every serializer's test creates those
additional Java objects. For example, even the Java built-in serialization
test, which uses data.media.MediaContent as its native format, makes a copy:
https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java
The purpose of this is to measure the overhead of reading every field so
serializers with expensive accessors are penalized appropriately.
Unfortunately, as you observed, this also includes the overhead of
allocating, writing, and GC'ing the additional objects, which isn't ideal.
However, I think it's still sort of "fair" in that all serializers pay this
same price.
Like you said, we could remove the overhead by writing custom code for each
set of serializer classes:
- A function to every field and blackhole
<http://javadox.com/org.openjdk.jmh/jmh-core/1.6.3/org/openjdk/jmh/infra/Blackhole.html>
it.
- A function to compare against a data.media.MediaContent value.
The reason we didn't do that was the additional maintenance burden. We
would not only have to write those additional functions for each set of
serializer classes, we'd have to be vigilant to ensure the code was correct
(e.g. make sure it wasn't skipping a field).
With our current setup, each set of serializer classes only needs a
"Transformer" implementation. If the transformer implementation is
incorrect, we'll probably detect that because the round-trip comparison
will fail.
(Really sorry if I'm still misunderstanding something! I don't have time
to look into the code in detail right now.)
…On Fri, Jun 16, 2017 at 6:15 AM, Maxim Zaks ***@***.***> wrote:
Well I guess this is the common misconception which people have towards
random value access feature. It is not a lazy parsing feature, where the
data is transformed only on demand.
It is rather read access friendly representation of data.
This line MediaContent.getRootAsMediaContent(bb, mc);
Turns MediaContent mc = new MediaContent(); into read only accessor of
received data.
As buffer has its own local references stored in a virtual table there is
no actual parsing needed.
The getters of MediaContent just do some addition and subtraction of
relative offsets, to get the absolute buffer offset, but this is similar to
what a runtime does when we want to access a property of an object. OK
there is also the conversion of the value at the offset to a type by using
java.nio.ByteBuffer class, but this hardly can be called as parsing.
Comparing to other formats, where the values are stored as text, which
actually has to be parsed, or in a format where we need to perform a linear
traversal to find the value, and in this case it rather better to transform
everything in one go.
It is, as if we would compare balanced binary search tree to a linked list.
So what I find a bit unfair in the current implementation of the test, is
the fact that serializers.flatbuffers.media.MediaContent in deserialize
method is converted to data.media.MediaContent and than
JavaBuiltIn.mediaTransformer is used to check the values. In suggestion I
posted as gist, deserialize method returns serializers.flatbuffers.media.
MediaContent and I implemented static final class Transformer extends
MediaTransformer<MediaContent> which knows how to forward and reverse
data given data.media.MediaContent and serializers.flatbuffers.media.
MediaContent.
As I don't have a complete understanding of the test I might miss
something, but I guess by having an appropriate Transformer we do not even
change the semantics of the test. It is still reading all the values from
the buffer, it just does not convert it into unnecessary intermediate Java
objects which has to be allocated and than garbage collected.
The only use case where using serializers.flatbuffers.media.MediaContent
directly would not be an option is if we would like to mutate the values.
As I mentioned before it is a read only accessor. But as far as I can tell
this is not a use case in this test.
So I don't plead for making a test for partial access of data, I just want
to avoid intermediate Java objects.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOpdVXH5CnTtFSJDcbOr1pz0jfJSIy0ks5sElW8gaJpZM4N6NAf>
.
|
Some of the test do have there custom transformers, like for example the winner of the benchmark 😉 So than we are back at the "unfair" point 🙂. |
@mzaks be that as may, I do not see point in trying to spend huge amounts of effort to make test more favorable for certain kinds of decoders. Use case being tested is a common "decode the whole message" use case, and for that things work. If you want to create separate set of tests, or perhaps new project, that can be valuable. But I see low effort/value ratio for modifying existing tests. |
Ah, dammit. Took a closer look at the code and I think I (finally)
understand what you're saying, Maxim.
Whoever wrote the original FlatBuffers test accidentally made the
serializer do the transformation work as well. None of the tests should be
doing this. Sorry for taking so long to realize this; I think I narrowed in
too quickly on your "random access serializer" comment and thought you were
talking about a different issue.
It looks like your diff just moves the transformation code into a
Transformer, where it belongs. That seems fine with me. Do you think you
can clean that code up and submit a pull request?
Also, a minor suggestion: if your original message came with a diff then I
might have more quickly realized what you were talking about. Because it
was a full file, I was lazy about actually figuring out what the diff was.
(Still my fault for jumping to the wrong conclusion.)
Tatu: would appreciate if you could double-check my evaluation to make sure
I'm not misunderstanding yet again :-P
…On Fri, Jun 16, 2017 at 10:58 AM, Maxim Zaks ***@***.***> wrote:
Some of the test do have there custom transformers, like for example the
winner of the benchmark 😉
https://github.com/eishay/jvm-serializers/blob/master/tpc/
src/serializers/colfer/Colfer.java#L64
So than we are back at the "unfair" point 🙂.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOpdQgbXMZOxsgtx-ZF4DaIPuoGp01Eks5sEsIxgaJpZM4N6NAf>
.
|
Will do. I also saw that there is a new version of FlatBuffers 1.7.0 will also update it in the PR. |
I wrote the original flatbuffer submission and while its possible that I did not write an optimal implementation I have issues with @mzaks gist. |
zapov, thanks pointing out the reuse. I agree that our benchmarks
shouldn't reuse message objects.
(Though some real-world users may reuse objects for performance, I think we
decided that this benchmark should measure the more common/simple use case.)
But I think the other issue Maxim brought up is legitimate -- the current
FlatBuffers benchmark is doing extra work. I think we should switch
FBSerializer to return FlatBuffer MediaContent objects and write an
FBTransformer to convert between that and the POJO.
(I tried making the change myself, but ran into an NPE and gave up since
I'm not familiar with FlatBuffers :-P)
…On Sun, Jun 25, 2017 at 4:37 PM, zapov ***@***.***> wrote:
I wrote the original flatbuffer submission and while its possible that I
did not write an optimal implementation I have issues with @mzaks
<https://github.com/mzaks> gist.
I did not write any transformer, because there was no POJO class to use
for such purpose.
This implementation creates two cached instances:
https://gist.github.com/mzaks/a0dd3d68f8958da720680227494695
31#file-flatbuffers-java-L40
and https://gist.github.com/mzaks/a0dd3d68f8958da720680227494695
31#file-flatbuffers-java-L112
which seems not aligned with other implementation.
While it's cool that flatbuffers can access field by lookup instead of
parsing, it's more likely that it won't be much faster when it needs to
create garbage like everyone else.
By reusing those instances it avoids doing the work everyone else is doing
(there are other serializers which can deserialize into an instance)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOpdX9xq7cysvIWAEpEk2fPOygTqDSkks5sHu8xgaJpZM4N6NAf>
.
|
I don't remember the actual reason why I wrote it like that instead of using a transformer. I vaguely remember something about not being able to convert it (but of course it's possible that I just didn't know how to do it). Since this bench tests for a single object and this optimization is not part of the library I don't think stuff like that are "fair" to other codecs. On an unrelated note, as far as the improvements to the bench goes, I think it would be worth to try and get numbers without the copy in forward/reverse: https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java#L93 If we made that change I would drop the current dsl-json implementations and just put the databinding one in, as thats the most common use case. |
Here is a cleaned up gist where I removed all the "smart" techniques. I would like to discuss if those techniques are "fair" or not though. With a binary serialisation strategy we have always two parts:
FlatBuffers binary format allows user to do this: Same applies to: As a matter of fact, the generated API encourage user to reuse the table accessor classes: Those accessor classes are just convenience to store two things:
It would be possible to avoid them all together and just provide pure static methods which receive reference to byte buffer and offset, and return the value or reference to a table. This way we could avoid class instantiation and garbage collection all together. Again this is a case of binary representation supporting something that the implementation does not provide. Does restricting a driver to use only 3 gears on his 5 gears vehicle, because other drivers drive vehicles with only 3 gears fair? Hard to say 😉. |
If you are returning an instance to the outside I don't think you should be holding on that instance and using it for next request. Meaning I think it's fine to keep a cache of flatbuffer instances which you deserialize into, but I don't think it's fine to keep a cache of the |
Yes absolutely. The "caches" are meant only for sequential deserialisation in the same "layer". If the instance goes to a different layer of the application, it should not be reused. Same applies for multi threaded scenario. In the test as far as I can tell the decoding is done sequentially, so there is no need to create new instances on every iteration. Let's consider a real world scenario. Our endpoint get requests with a payload, where we need to read the values and spin of processes, which will result in a response. In this case the payload is processed sequentially in one layer. Only the values will leave this layer. And this is where FlatBuffers shines - extracting of the values, specially if we need only a portion of the data. |
We try to measure the performance that users will typically encounter. For
example, I would guess that most users will not bother maintaining a string
dedupe table, so our standard flatbuffers benchmark shouldn't include that
optimization. (But maybe my guess is wrong here.)
For some serializers, we run an additional benchmark where we apply
additional optimizations (e.g. "flatbuffers-opt"). We used to have more of
these, but have generally tried to reduce the number of benchmarks because
the results page is already overwhelming.
…On Tue, Jun 27, 2017 at 2:59 AM, Maxim Zaks ***@***.***> wrote:
Yes absolutely. The "caches" are meant only for sequential deserialisation
in the same "layer". If the instance goes to a different layer of the
application, it should not be reused. Same applies for multi threaded
scenario. In the test as far as I can tell the decoding is done
sequentially, so there is no need to create new instances on every
iteration.
Also totally agree on data.media objects.
Let's consider a real world scenario. Our endpoint get requests with a
payload, where we need to read the values and spin of processes, which will
result in a response. In this case the payload is processed sequentially in
one layer. Only the values will leave this layer. And this is where
FlatBuffers shines - extracting of the values, specially if we need only a
portion of the data.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOpdRfrnXDT4rbeRfe3LCxGx8VxwpaNks5sINKDgaJpZM4N6NAf>
.
|
Hi there,
first of all thank you for the great benchmark. It is a very nice resource.
I wanted to point out that format which support random value access like Cap'nProto and FlatBuffers are measured a bit unfair. The deserialise methods of the test make a copy of every value encoded in the buffer even though it is not necessary. The buffers are serialised in a way that make them accessible on demand, without the need of actual unmarshaling.
In this gist you can find a rewritten test for FlatBuffers:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531
Which passes and returns following result for decoding:
average:5.92301542549922ms deviation:0.6135154254992203ms
Formats which invest in supporting random value access do it at cost of size and encoding speed.
This is why I find it unfair to show the shortcomings of encoding and go around the benefits of the decoding.
The text was updated successfully, but these errors were encountered: