-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve serialization for TaskResourceInfo #16700
base: main
Are you sure you want to change the base?
[WIP] Improve serialization for TaskResourceInfo #16700
Conversation
Signed-off-by: Chenyang Ji <cyji@amazon.com>
❌ Gradle check result for 8fc3f92: ABORTED Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@ansjcy you can also run performance benchmarks on your PR as well to see how it compares against nightly baseline runs. |
@ansjcy Why are we string-ifying and stashing this object into the ThreadContext at all? If we need to send data between nodes, we should be using the transport protocol and be serializing this thing using the Update: this appears to be where the idea to use the thread context came from: #13172 (comment) @rishabhmaurya What do you think here? I can see how using the thread context can be easier, but serializing this object to/from a string any any hot path code can be expensive (and doing a binary format into a base64 string feels hacky). Also the "response headers" concept seems like it is meant to be propagated ultimately as an HTTP header and that's not the case here. |
@andrross thanks for opening #16635. Features like these should be well benchmarked and profiled before merging them in.
Ideally, we should not expose any header on 9200 port exposed to the end clients and should be only limited to 9300 transport. @ansjcy could you please confirm? I'm still of the opinion that using thread context and attaching header to transport actions is probably fine here if we sample the requests especially when they are of hot code path like search and overhead is high like more than 1% of overall cpu time. |
#13172 (comment) this comments kind of contradicts what we are seeing in #16635 |
@rishabhmaurya To be fair, I'm not surprised this overhead wouldn't be noticeable in most normal workloads. I was deliberately profiling a query that was as simple as possible looking for other overhead. The macro benchmarks have enough variation that the addition of small overhead is hard to spot.
The round tripping through a string is what ends up being costly here. I'm guessing the concept of these headers comes originally from HTTP where they were just string values? Now that we're stashing complex objects it would be better just to keep them as binary, since they end up being written and read as binary via StreamInput/StreamOutput anyway. |
Ah! I missed your point earlier. Ideally, in any telemetry solution, we inject trace or some form of id and key, value pair baggage to it. If i understand this logic correct, below is the what we are injecting -
@andrross I think we might be able to get rid of some of these fields like I agree that we can probably read and write them as binary via StreamInput/StreamOutput. I think reason why we are converting it to string here is because the ThreadContext doesn't expose such API where we can directly pass byte values - OpenSearch/server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java Line 787 in 80ca32f
Maybe we can explore exposing such API, it definitely looks a little complicated at first look. |
Our transport protocol defines these "headers" at a very low layer and they are typed as a simple map of string key-value pairs (or a string key to a set of strings in the case of response headers). If we have use cases to write data that doesn't conform to those types then it is "complex" in that it doesn't seem to match the original intent of the headers data structure. Round-tripping non-string data through a string here is a shortcut to avoid the (admittedly very complex!) work of changing the transport protocol to natively accept different data types. I think the options to consider here are:
I honestly don't love option 2 because it will add complexity (particularly to handle backward compatibility) and seems like only a modest improvement over the status quo. And we probably need more evidence than my one profiling exercise to say that it is worth it to do option 3. @rishabhmaurya what do you think? |
I imagine a protobuf-based implementation doesn't suffer from this problem? So there's 4 which is aggressively replace the binary protocol with that? I am not saying not to do 3, it's worth it as a short term solution, too. |
@dblock The binary serialization isn't really the issue here. We've built this whole ThreadContext abstraction layer on top of the binary protocol, and that is where the complex work comes in to make changes. Whether it is protobuf or StreamInput/StreamOutput at the bottom doesn't really change a whole lot here. |
@andrross we should explore option 3 to allow binary values to be injected directly. @ansjcy it would be nice if we can evaluate it if not implement it right away. I'm also concerned about its future usage, if we add more measurement and their computation logic into hot code paths, how do ensure that they don't run into unwanted cpu time increase. it is not an easy problem to solve, so looking for ideas on adding safety nets here. |
Thansk for all the comments! Going through them I agree
should be something we can explore here. Since as .andrross mentioned, the 2nd approach (this PR) can cause certain backward compatibility problems. Another approach to totally avoid adding complexity to the hot search path is, instead of piggybacking the data with each shard search response, we can consider the "alternative approach" discussed here #12399: we can write an asynchronous post-processor job as part of the query insights data consumption pipeline. This post-processor would periodically gather data from data nodes and correlate it with queries to calculate the final resource usage accurately. What do you think about this approach? :) @rishabh6788 @andrross @dblock I actually had a POC to explore this early this year: #12473 |
@ansjcy Does this introduce a new bottleneck for very large clusters where the single cluster manager node has to aggregate data from all data nodes (assuming the post-processor runs on the cluster manager)? |
Description
Use binary serialization to avoid the JSON parsing overhead when piggybacking task resource usage info from data nodes to coordinator node.
Related Issues
Resolves #16635
Tests
I ran the big5 benchmark tests on a cluster with 3 master nodes (c5.xlarge) and 2 data nodes (r5.4xlarge) and did CPU profiling for term queries like mentioned in #16635. The parsing overhead is less than 1% in my tests.
Also validated the functionalities of query insights is not impacted.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.