Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin until error strategy attempts to find an active channel #445

Merged
merged 2 commits into from
Feb 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions changelog/@unreleased/pr-445.v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
type: improvement
improvement:
description: Pin until error strategy attempts to find an active channel rather
than bailing on the first limited channel. This matches round robin more closely
when many channels are blacklisted. Clients are no longer forced to use N retries
to find the next available channel.
links:
- https://github.com/palantir/dialogue/pull/445
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,23 @@ static LimitedChannel pinUntilErrorWithoutReshuffle(

@Override
public Optional<ListenableFuture<Response>> maybeExecute(Endpoint endpoint, Request request) {
return executeInternal(endpoint, request, 1);
}

private Optional<ListenableFuture<Response>> executeInternal(Endpoint endpoint, Request request, int depth) {
int currentIndex = currentHost.get();
LimitedChannel channel = nodeList.get(currentIndex);

Optional<ListenableFuture<Response>> maybeFuture = channel.maybeExecute(endpoint, request);
if (!maybeFuture.isPresent()) {
OptionalInt next = incrementHostIfNecessary(currentIndex);
instrumentation.currentChannelRejected(currentIndex, channel, next);
return Optional.empty(); // if the caller retries immediately, we'll get the next host
// Try enough times to rotate through all nodes (assuming no concurrent clients) before returning
// a completely rejected request.
if (depth < nodeList.size()) {
return executeInternal(endpoint, request, depth + 1);
}
return Optional.empty();
}

ListenableFuture<Response> future = maybeFuture.get();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,13 @@ public void out_of_order_responses_dont_cause_us_to_switch_channel() throws Exce
.isEqualTo(101);
}

@Test
public void finds_first_non_limited_channel() {
when(channel1.maybeExecute(any(), any())).thenReturn(Optional.empty());
setResponse(channel2, 204);
assertThat(pinUntilError.maybeExecute(null, null)).isPresent();
}

private static int getCode(PinUntilErrorChannel channel) {
try {
ListenableFuture<Response> future = channel.maybeExecute(null, null).get();
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions simulation/src/test/resources/report.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,35 +6,35 @@
all_nodes_500[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=50.0% client_mean=PT0.6S server_cpu=PT20M client_received=2000/2000 server_resps=2000 codes={200=1000, 500=1000}
all_nodes_500[UNLIMITED_ROUND_ROBIN].txt: success=50.0% client_mean=PT0.6S server_cpu=PT20M client_received=2000/2000 server_resps=2000 codes={200=1000, 500=1000}
black_hole[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=89.9% client_mean=PT0.600286226S server_cpu=PT17M58.2S client_received=1797/2000 server_resps=1797 codes={200=1797}
black_hole[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=88.7% client_mean=PT0.601752452S server_cpu=PT17M43.8S client_received=1773/2000 server_resps=1773 codes={200=1773}
black_hole[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=88.7% client_mean=PT0.600505405S server_cpu=PT17M43.8S client_received=1773/2000 server_resps=1773 codes={200=1773}
black_hole[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=89.9% client_mean=PT0.600286226S server_cpu=PT17M58.2S client_received=1797/2000 server_resps=1797 codes={200=1797}
black_hole[UNLIMITED_ROUND_ROBIN].txt: success=65.0% client_mean=PT0.6S server_cpu=PT12M59.4S client_received=1299/2000 server_resps=1299 codes={200=1299}
drastic_slowdown[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=100.0% client_mean=PT2.069939083S server_cpu=PT2H17M59.756333311S client_received=4000/4000 server_resps=4000 codes={200=4000}
drastic_slowdown[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=100.0% client_mean=PT2.053746438S server_cpu=PT2H16M53.160125525S client_received=4000/4000 server_resps=4000 codes={200=4000}
drastic_slowdown[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=100.0% client_mean=PT2.053277999S server_cpu=PT2H16M53.111999959S client_received=4000/4000 server_resps=4000 codes={200=4000}
drastic_slowdown[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=100.0% client_mean=PT2.069939083S server_cpu=PT2H17M59.756333311S client_received=4000/4000 server_resps=4000 codes={200=4000}
drastic_slowdown[UNLIMITED_ROUND_ROBIN].txt: success=100.0% client_mean=PT8.353421749S server_cpu=PT9H16M53.686999978S client_received=4000/4000 server_resps=4000 codes={200=4000}
fast_500s_then_revert[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=98.1% client_mean=PT0.073222888S server_cpu=PT4M34.585833294S client_received=3750/3750 server_resps=3750 codes={200=3679, 500=71}
fast_500s_then_revert[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=99.7% client_mean=PT0.080628355S server_cpu=PT5M2.35633333S client_received=3750/3750 server_resps=3750 codes={200=3739, 500=11}
fast_500s_then_revert[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=76.7% client_mean=PT0.055463644S server_cpu=PT3M27.988666346S client_received=3750/3750 server_resps=3750 codes={200=2876, 500=874}
fast_500s_then_revert[UNLIMITED_ROUND_ROBIN].txt: success=76.7% client_mean=PT0.055463644S server_cpu=PT3M27.988666346S client_received=3750/3750 server_resps=3750 codes={200=2876, 500=874}
live_reloading[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=59.3% client_mean=PT2.717022131S server_cpu=PT1H21M18.049029538S client_received=2500/2500 server_resps=1865 codes={200=1483, 500=382, Failed to make a request=635}
live_reloading[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=54.2% client_mean=PT2.907371536S server_cpu=PT1H42M40.098775697S client_received=2500/2500 server_resps=2177 codes={200=1355, 500=822, Failed to make a request=323}
live_reloading[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=54.2% client_mean=PT2.947366793S server_cpu=PT1H45M11.292359783S client_received=2500/2500 server_resps=2229 codes={200=1355, 500=874, Failed to make a request=271}
live_reloading[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=52.0% client_mean=PT2.915893225S server_cpu=PT1H43M13.659598627S client_received=2500/2500 server_resps=2197 codes={200=1301, 500=896, Failed to make a request=303}
live_reloading[UNLIMITED_ROUND_ROBIN].txt: success=58.4% client_mean=PT2.8396S server_cpu=PT1H58M19S client_received=2500/2500 server_resps=2500 codes={200=1461, 500=1039}
one_endpoint_dies_on_each_server[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=46.4% client_mean=PT1.302954516S server_cpu=PT15M22.396603327S client_received=2500/2500 server_resps=1536 codes={200=1161, 500=375, Failed to make a request=964}
one_endpoint_dies_on_each_server[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=63.8% client_mean=PT0.601140264S server_cpu=PT25M client_received=2500/2500 server_resps=2500 codes={200=1595, 500=905}
one_endpoint_dies_on_each_server[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=63.8% client_mean=PT0.6S server_cpu=PT25M client_received=2500/2500 server_resps=2500 codes={200=1594, 500=906}
one_endpoint_dies_on_each_server[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=65.5% client_mean=PT0.6S server_cpu=PT25M client_received=2500/2500 server_resps=2500 codes={200=1638, 500=862}
one_endpoint_dies_on_each_server[UNLIMITED_ROUND_ROBIN].txt: success=65.5% client_mean=PT0.6S server_cpu=PT25M client_received=2500/2500 server_resps=2500 codes={200=1638, 500=862}
simplest_possible_case[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=100.0% client_mean=PT0.799939398S server_cpu=PT2H55M59.2000635S client_received=13200/13200 server_resps=13200 codes={200=13200}
simplest_possible_case[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=100.0% client_mean=PT0.998702894S server_cpu=PT3H39M42.87820128S client_received=13200/13200 server_resps=13200 codes={200=13200}
simplest_possible_case[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=100.0% client_mean=PT0.799939398S server_cpu=PT2H55M59.2000635S client_received=13200/13200 server_resps=13200 codes={200=13200}
simplest_possible_case[UNLIMITED_ROUND_ROBIN].txt: success=100.0% client_mean=PT0.799939398S server_cpu=PT2H55M59.2000635S client_received=13200/13200 server_resps=13200 codes={200=13200}
slow_503s_then_revert[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=100.0% client_mean=PT0.30283067S server_cpu=PT14M44.486985327S client_received=3000/3000 server_resps=3175 codes={200=3000}
slow_503s_then_revert[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=100.0% client_mean=PT0.346630773S server_cpu=PT16M52.121612924S client_received=3000/3000 server_resps=3197 codes={200=3000}
slow_503s_then_revert[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=100.0% client_mean=PT0.346102289S server_cpu=PT16M51.944355932S client_received=3000/3000 server_resps=3197 codes={200=3000}
slow_503s_then_revert[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=100.0% client_mean=PT0.745660042S server_cpu=PT36M23.70242803S client_received=3000/3000 server_resps=3411 codes={200=3000}
slow_503s_then_revert[UNLIMITED_ROUND_ROBIN].txt: success=100.0% client_mean=PT1.430720789S server_cpu=PT1H9M50.430528642S client_received=3000/3000 server_resps=3802 codes={200=3000}
slowdown_and_error_thresholds[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=3.4% client_mean=PT2.255528943S server_cpu=PT1H43M47.583860348S client_received=10000/10000 server_resps=1676 codes={200=344, 500=1332, Failed to make a request=8324}
slowdown_and_error_thresholds[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=1.2% client_mean=PT2.928958527S server_cpu=PT4H31M58.68391554S client_received=10000/10000 server_resps=4264 codes={200=120, 500=4144, Failed to make a request=5736}
slowdown_and_error_thresholds[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt: success=1.3% client_mean=PT2.973573019S server_cpu=PT4H37M50.614960566S client_received=10000/10000 server_resps=4345 codes={200=125, 500=4220, Failed to make a request=5655}
slowdown_and_error_thresholds[CONCURRENCY_LIMITER_ROUND_ROBIN].txt: success=1.2% client_mean=PT2.987017569S server_cpu=PT4H40M48.239132959S client_received=10000/10000 server_resps=4417 codes={200=120, 500=4297, Failed to make a request=5583}
slowdown_and_error_thresholds[UNLIMITED_ROUND_ROBIN].txt: success=1.2% client_mean=PT3.974129199S server_cpu=PT11H2M21.291999888S client_received=10000/10000 server_resps=10000 codes={200=120, 500=9880}
uncommon_flakes[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt: success=93.5% client_mean=PT0.203113054S server_cpu=PT7.376462491S client_received=9774/10000 server_resps=9441 codes={200=9348, 500=93, Failed to make a request=333}
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1 +1 @@
success=88.7% client_mean=PT0.601752452S server_cpu=PT17M43.8S client_received=1773/2000 server_resps=1773 codes={200=1773}
success=88.7% client_mean=PT0.600505405S server_cpu=PT17M43.8S client_received=1773/2000 server_resps=1773 codes={200=1773}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
success=100.0% client_mean=PT2.053746438S server_cpu=PT2H16M53.160125525S client_received=4000/4000 server_resps=4000 codes={200=4000}
success=100.0% client_mean=PT2.053277999S server_cpu=PT2H16M53.111999959S client_received=4000/4000 server_resps=4000 codes={200=4000}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
success=54.2% client_mean=PT2.907371536S server_cpu=PT1H42M40.098775697S client_received=2500/2500 server_resps=2177 codes={200=1355, 500=822, Failed to make a request=323}
success=54.2% client_mean=PT2.947366793S server_cpu=PT1H45M11.292359783S client_received=2500/2500 server_resps=2229 codes={200=1355, 500=874, Failed to make a request=271}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
success=63.8% client_mean=PT0.601140264S server_cpu=PT25M client_received=2500/2500 server_resps=2500 codes={200=1595, 500=905}
success=63.8% client_mean=PT0.6S server_cpu=PT25M client_received=2500/2500 server_resps=2500 codes={200=1594, 500=906}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
success=100.0% client_mean=PT0.346630773S server_cpu=PT16M52.121612924S client_received=3000/3000 server_resps=3197 codes={200=3000}
success=100.0% client_mean=PT0.346102289S server_cpu=PT16M51.944355932S client_received=3000/3000 server_resps=3197 codes={200=3000}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
success=1.2% client_mean=PT2.928958527S server_cpu=PT4H31M58.68391554S client_received=10000/10000 server_resps=4264 codes={200=120, 500=4144, Failed to make a request=5736}
success=1.3% client_mean=PT2.973573019S server_cpu=PT4H37M50.614960566S client_received=10000/10000 server_resps=4345 codes={200=125, 500=4220, Failed to make a request=5655}