-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Improve rpc_soak and channel_soak test to cover streaming #11687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing soak_num_threads Flag
…, channel creation logic, and refactor thread body for performSoakTest
…d simplify thread result aggregation
…edException. Update the ThreadResults data type.
return new SoakIterationResult(TimeUnit.NANOSECONDS.toMillis(elapsedNs), status); | ||
} | ||
|
||
private SoakIterationResult performOneSoakIterationPingPong( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much benefit there will really be to running the different RPC types (client streaming, server streaming, etc.) in a loop.
The code paths and behaviors exercised are going to be very similar to the unary based soak tests we already have.
Here is a straw man idea for what I think might be useful here:
- using a long-lived stream
- start the stream once (per thread). On each soak iteration, send one message and receive one message.
- If/when the stream fails, indicate it in the log, but otherwise restart the stream and continue on the new one.
This would provide us a new dimension of test coverage that we don't currently have much of (long lived RPCs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex, thanks for the suggestion! I have discussed with Feng. He said, “For completeness, we should cover all these variations of RPCs, but I don’t see how sending a message out and back can construct a long-lived stream. Usually, long-lived streams last for hours, and covering long-lived streams is not part of the plan.”
So, seems he definitely wants to ensure we cover all RPC types. My thought is we can handle the long-lived RPCs in a separate set of tests, not as part of the current soak tests. We can definitely consider this in more detail and plan it out for a future PR. Let me know your thoughts, and happy to discuss further!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“For completeness, we should cover all these variations of RPCs, but I don’t see how sending a message out and back can construct a long-lived stream. Usually, long-lived streams last for hours, and covering long-lived streams is not part of the plan.”
My idea here is not to send a message out and back once. Instead, it's to keep sending messages out and back on the same stream for as long as possible. By setting soak iterations and soak_min_time_ms_between_rpcs, you can make these clients do a fixed QPS for a fixed time (e.g. 10 QPS for 1 hour).
So, seems he definitely wants to ensure we cover all RPC types.
Our integration test matrix is already huge, and these tests are expensive to maintain generally speaking. I'm not excited about adding this for the sake of completeness, unless there's a very strong reason that I'm missing. It seems like these tests will overlap a lot with the existing unary based tests, so it doesn't seem like there would be much bang for the buck with these additional tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circling back on this per offline chats.
If we just want to run various RPC types in multiple threads in a loop, then I think the StressTestClient.java is already geared towards that.
That can be configured to run any of these tests:
grpc-java/interop-testing/src/main/java/io/grpc/testing/integration/StressTestClient.java
Line 525 in f1109e4
private void runTestCase(Tester tester, TestCases testCase) throws Exception { |
We actually have this stress test client is already running in our integration test dashboards for Go and Java (see internal bug b/298484219 for context), but only for empty_unary RPCs. We could extend it to run other types of RPCs (some of those test cases involving cancellation etc. may actually be interesting).
Also note there are currently some shortcomings of the stress test compared to the interop soak test:
-
Stress test has no error tolerance (note how it will abort a thread upon a single RPC failure).
-
Stress test has no knob to control QPS. I.e. each thread performs RPCs in an uncontrolled closed-loop.
-
Unlike the interop soak test, there is way to gather statistics results about latency, errors, etc. (the interop soak test logs all results into a parseable format that can be analyzed offline for these things).
I think 1) is the highest priority thing to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have talked with Feng and he has agreed with this approach. I will go ahead to work on it. Thanks!
Uh oh!
There was an error while loading. Please reload this page.