-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binary protocol for streams described in #1058 #1062
Comments
You can also take a look at protobuf, with proper use you can have quite Using JSON for streams is comparatively inefficient. Using binary streams I'll document how the protocol/schema more when I've got a prototype that — |
I would like to start working on this one. I wonder when you are going to release description of protocol/schema? |
Thanks for getting involved @andrey-polyakov! I haven't formally documented the schema yet. I've done a first pass at generating reasonable JSON for the 3 new streams in 1.5 and there already exists a JSON conversion for the original If you run the I also added sample JSON output to the docs here: https://github.com/Netflix/Hystrix/wiki/Metrics-and-Monitoring#sample-streams. |
I've read the document about metrics, and checked Exchange format back-end (Jackson/Json) ought to be replaced with CBOR flavour of Jackson at the first place then we can figure the potential of CBOR. Right? Now my second question, if we break down the task on steps, from where I should start doing actual work? |
At the moment, the 4 HTTP streams are all generated by a servlet which turns Java objects into JSON and them publishes them onto the wire as a String. My understanding is that the term SSE only refers to text, so sending binary is no longer "SSE". Therefore, I think we're free to implement this however we choose. Given the lack of a standard, however, we should probably also be fairly explicit about how we're choosing to encode the data, and how to decode it on the other side. I also think that there are a few implementation options. I think the most similar to today is replacing the JSON data writes to the wire with CBOR data writes to the wire. I'd also like to explore the performance of that option against writing to a memory-mapped file. Then a different server could be dedicated to taking that data and putting it on the wire. This is #1063. But for now, it's a great first step to understand the design space of using a binary protocol from the same process that is generating metrics. So here's my proposal for work that should get done.
Any steps you can think of that I'm missing? I'm happy to pitch in on any of these, and also happy for you to dive in and take any of these that are most interesting to you. I'm pretty busy with travel the next 2 weeks, so it's likely that I can help review code, but probably not be writing any. Thanks for the offer to contribute - I'm excited about getting this built. |
I will definetely take a closer look in the next two days. Fortunately, we have a stat in BC on Mon. That is fine as I am happy to contribute. I guess only cordination and reviewing will be required. Thanks for the prompt reply. |
Can you elaborate a little what documenting a schema means please? Then I'll just start from the first thing first) Thanks. |
As for now, I was able to replace data format back-end with Jackson-dataformat-CBOR and add methods to write CBOR instead. I modified some test cases to verify correctness (not all of them yet). Specifically, I converted JSON to CBOR using new back-end API. Currently I started working on recovering HystrixRequestEvents back from CBOR. Cheers. |
@andrey-polyakov For documenting the schema, I meant describing what you expect data to look like going over the wire. As an example, I can imagine encoding command keys directly in binary by treating them as strings. I can also imagine treating them as enums, and providing 2 types of messages:
If you could describe how you think messages should look and how clients should behave in English, that should allow for a design discussion before we get to the implementation. |
@mattrjacobs I pushed some changes to my branch here. Specifically, I was able to serializeHystrixRequestEvents and deserialize it to a new class. I created that new calss because it is hard to instantiate the original class. As for schema, my impression after a few trials is that better to be minimalistic keeping everyting except string values binary. Let us keep it as close as possible to JSON versions but throw away field names. I guess it is easier and logical to keep all the values, their order and the way they are nested. Given said that it is about writing binary values in a certain order. Also, I am thinking about a unique format ID being writted at the very begining of any document. To give a more detailed format description I need to get something working, I guess. You can check how raw data looks like in those unit tests I wrote. Let me know what you think. Thanks. |
It is possible to pack boolean fields together to use less memory. In one byte there may be sitting six boolean values though I am not sure if it is worth it. |
@mattrjacobs Weekend is the time when I work on open-source and it is approaching so that would be very nice if you gave me some feedback. Thanks. |
Sorry for the delay. From what I can tell, this looks great. Do you have any measurements on how the wire size compares to the JSON version? |
That's fine. Thank you! I will continue in the following two days and release some figures. I am going to do what I described in the plan then. Deserializers and unit tests should keep me busy for a while. What do you think about packing booleans together as bits? |
Here is an example of CBOR beating JSON with 98 and 249 bytes respectively:
which is equivalent of |
That seems very promising. Nice work! As far as boolean packing, I think efficient encoding should be the prime concern, so I so no reason not to, unless it makes the implementation unduly hard. |
Yep, numbers are appealing though CBOR engine is not stable. I've used some work arounds to make things work and this is why some field names are still there. The engine does not always work as expected. |
@mattrjacobs Here is update on my work in the sandbox I got: This is what I've done:
I put it on hold for a while and I am wondering where to go from this point? Optimizing the format to CBOR is a huuuge task thus I just need to confirm that it is a feasible goal. Cheers! |
Yeah, I'm still of the opinion that getting a fully-optimized CBOR serialization/deserialization implementation is worthwhile. I'll play around with your branch and give you some more detailed feedback. Can you provide any details on which pieces of the engine were not working as expected? |
Actually, there were a couple of things described in this ticket. UPDATE: resolved |
It would be nice to know exactly what performance problem or bottleneck is seen. I ask because I would think making the current HystrixMetricsStreamServlet nonblocking a higher priority (right now it is NOT using HTTP servlet 3.0 async support). It uses blockingqueues and polls. Using true async support (or I suppose using the rx-netty stream) should reduce latency as well as drastically reduce resources (threads). I would try that before messing with serialization formats particularly since I would imagine most consumers are Javascript (node, browser) and thus JSON would be very fast (and deserialization speed can shockingly be slower than network hops on some of these platforms). |
Opened #1123 for async servlets. Internally at Netflix, it's important to use binary because we have large non-Node systems responsible for processing these streams for analysis. We believe a binary protocol will help efficiency on both sides of the stream. Additionally, since we pay for bytes on the network, minimizing payload size is also important to us. |
@mattrjacobs Since CBOR supposed to work in key/value format one still have to provide the key name which takes space. It is also quite tedious to work with. May be Protocol Buffers is a better option? |
Using JSON for streams is comparatively inefficient. Using binary streams would make consumption/transmission less expensive.
I'll document how the protocol/schema more when I've got a prototype that works, but my initial guess is that it'll look fairly similar to a CBOR representation of the existing schema. More to come on this as I get more experience
The text was updated successfully, but these errors were encountered: