-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal for performance improvements #217
Comments
Hi, interesting stuff! About switching I've kept the data-driven encoder/decoder in the gpb module mostly to be able to cross-check encoding/decoding results against generated modules, but I have generally not spent any efforts on performance or other bells and whistles here. This has gone into the code generator instead. Regarding maps, I just opened an issue over in hexpm/hex_core#134 for what the oldest OTP version to support. (If it is still 17, it would impose some limitations on how one can phrase the maps expressions) Regarding the api of the (assumedly intended) gpb module, I think an approach could be to work on maps internally, but the api would still need to accept defs also as a list (bwd compat), and in that case convert the defs to maps as a first step. As the documented definitions format is a list, I guess it would make sense to also expose, as an api function, that function that will turn the definition list into a map. (An alternative is of course to define a new version of the definitions format, but I think would be more work, since then the code generator needs to be adapted as well.) Regarding io-lists, do you have any performance figures for only this part of the proposed change? Currently, the code relies on the Erlang optimization that binaries are initially write-appendable under the hood. This is at least for the generated code, maybe also for the gpb module (don't remember,) but again, are you referring to the data-driven encoder/decoder in the gpb module? I tried earlier with iolists instead, but didn't find it made any much of a speedup, if I remember correctly. Unfortunately, I don't think I have any results to share anymore and it was quite some time ago. |
Yeah, I was benching and optimizing
As mentioned in the first comment, switching to iolists brought an extra 2x improvement (on the particular data structure I was benching).
Yes, binary is write-appendable, but it still needs to be reallocated if it doesn't have enough space (source). Given a large enough input data structure, I'd expect frequent reallocation. With iolists, these reallocations can be avoided. Only at the end we invoke an efficient
I'll double check and report back. |
Sorry, this statement is misleading. The speed up was 2x after the maps optimization. But disregarding that, and speaking in absolute numbers, with the test input I'm using the master version encoding takes 5ms on average. If I switch to iolists, it takes 4ms on average. |
Yes, that's correct, it is a parallel implementation, there is no run-time dependency from the generated code to gpb.
Good point about reallocations, I didn't think about that. And 20% improvement on encoding (for this particular input) is indeed something :) So this seems it would be a worthwhile improvement. There could probably a break-even somewhere if a binary of an iolist is small, to use integers instead in case they are below 256. I'm thinking about memory usage for binaries vs integers, as described in the efficiency guide For example let's say we have a field that is of type |
How stable would you say the |
Stable in what sense? I'd say both implementations are fairly well tested. Most work has gone into the generated code. Both since it is a bit more complex problem, but also because there are more options for it. I forgot to mention that for the generated code, there is also a nif option to generate code that uses Google's C++ protobuf library via NIFs to encode and decode, and a bit more performance can be squeezed out with the bypass_wrappers option. But there are some caveats if you plan to use or switch between overlapping set of proto definitions, see this section of the README.nif-cc and the build process becomes yet a bit more complex of course. |
OK thanks! Let me know if you're interested in accepting these two perf improvements for the gpb module. |
Yes, definitely, I think they'd be nice improvements. |
Thanks for both PRs. I will take a look. |
Recently I analyzed the encoding performance of this library for my clients, who need to encode somewhat larger messages fairly frequently. After some analysis and experimentation, I was able to improve the average encoding speed from 5.20ms to 1.07ms. Here is a quick bench comparison of the optimized encoder with enif_protobuf and Elixir protobuf:
Originally, gpb was the slowest of the bunch, about 7x slower than
enif_protobuf
.I've done two changes, in a hacky way. I'd like to discuss the possibility of properly contributing these changes upstream. The cahnges are:
Switching
MsgDefs
representation from list to map.The case I was benching encodesa deeply nested structure, while the size of
MsgDefs
is about 1000 elements. In this case encoding requires frequent sequential scans of the list. Converting the representation to a map reduced the encoding time by about 2.5x.Using iolist while building the encoded binary.
The encoder performs a lot of binary concatenations internally, which requires frequent binary expansions. Switching to
iolist
further reduced the encoding time by about 2x. This can also simplify the implementation a bit, because some recursions can be changed from tail to body, and can even be replaced withlists:map
or comprehension.Would you be open to accept these changes upstream? As mentioned, these are currently done in a hacky way, just as a proof of concept. I need to redo them properly, and in the case of the 1st change also adapt the decoder and the generated code.
The text was updated successfully, but these errors were encountered: