-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test-only] Simulate dialogue clients against a DeterministicScheduler #348
Conversation
dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java
Outdated
Show resolved
Hide resolved
dialogue-core/src/main/java/com/palantir/dialogue/core/ConcurrencyLimitedChannel.java
Outdated
Show resolved
Hide resolved
import java.util.Optional; | ||
import java.util.function.Supplier; | ||
|
||
final class GenericRefreshingChannel implements Channel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split this change to a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled out here: #352
@@ -0,0 +1 @@ | |||
success=58.3% client_mean=PT0.7228S server_cpu=PT4M49.12S received=400/400 codes={200=233, 500=167} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -42,22 +42,24 @@ | |||
*/ | |||
final class ConcurrencyLimitedChannel implements LimitedChannel { | |||
private static final Void NO_CONTEXT = null; | |||
private static final Supplier<Long> SYSTEM_NANOTIME = System::nanoTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LongSupplier or a custom interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the netflix builder requires a Supplier<Long>
, so I've used the same caffeine Ticker interface we use everywhere else.
// Do nothing | ||
numScheduled++; | ||
} | ||
if (numScheduled > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (numScheduled > 1) { | |
if (log.isDebugEnabled()) { |
zero and one are also helpful data points imo.
} | ||
|
||
private Stream<ScheduledRequest> infiniteRequests(Duration interval) { | ||
return Stream.iterate(0, current -> current + 1).map(number -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IntStream. 2m requests isn't a ton, may be worth using longs here?
|
||
private static Request constructRequest(int number) { | ||
return Request.builder() | ||
.putHeaderParams("X-B3-TraceId", "req-" + number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X-B3-TraceId
overloading is a bit funky, perhaps we should either use the existing tracing channels with random IDs, or use a simulation-specific header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point - I'll probably want to visualize traces for these at some point, have switched to a different constant.
import com.palantir.dialogue.Response; | ||
import java.util.concurrent.TimeUnit; | ||
|
||
final class HistogramChannel implements Channel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use our standard instrumented channel with an AbstractTaggedMetricRegistry implementation that overrides timerSupplier
and uses a custom reservoir supplier? Would be nice to reinforce that our metrics are good enough, but may not be worthwhile here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will happily do that - the only reason I didn't initially was there are a lot of methods on TaggedMetricRegistry and it was gonna inflate the line count of this PR quite severely!
return future.get(); | ||
} | ||
|
||
return Futures.immediateFailedFuture(new SafeRuntimeException("limited channel says no :(")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think queued channel should do this #351
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was using this as a substitute for the queued channel, but it's not being used right now, so will delete and bring it back when needed.
} | ||
|
||
/** This is an alternative to the {@link com.palantir.dialogue.core.QueuedChannel}. */ | ||
static Channel dontTolerateLimits(LimitedChannel limitedChannel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed this as closely as changes to prod code, but the output is insightful and the implementation looks reasonable.
e8462aa
to
19ee73a
Compare
Thanks carter, I'm gonna update my other PRs with this code and hopefully have some pretty graphs to show! |
Before this PR
I want to conclusively be able to demonstrate all the bits of our RPC stack is expected to behave under various kinds of load, but at the moment we just have to eyeball code and try to construct the expected behaviour in our heads.
After this PR
==COMMIT_MSG==
A new
:simulation
project runs dialogue calls in a DeterministicScheduler and records the results.==COMMIT_MSG==
The tests run instantly and are fully deterministic and reproducible :D
Example chart:
Sorry this is such a big PR - I'd recommend just cloning it and running SimulationTest to get a sense of what happens, you can also fiddle with the numbers.
Possible downsides?