Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client respect Node-Selection-Strategy response header #688

Merged
merged 28 commits into from
May 1, 2020
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1cb5e2e
allow servers to define client node selection strategy
Apr 28, 2020
f0581e8
tests and null away
Apr 28, 2020
234dc9d
ignore leading unknown strategies
Apr 28, 2020
39bbaa6
update simulations
Apr 28, 2020
77cc246
pinned channel is reused across updates
Apr 29, 2020
b944fdc
NSS selector uses score to determine strategy
Apr 29, 2020
7a8f236
cleanup
Apr 29, 2020
0b3315c
upgrade guava
Apr 29, 2020
651f397
improve strategy parsing
Apr 29, 2020
7094336
Update dialogue-core/src/main/java/com/palantir/dialogue/core/NodeSel…
ferozco Apr 29, 2020
85a54eb
avoid labdas
Apr 29, 2020
c214436
Merge branch 'fo/server-side-node-selection' of github.com:palantir/d…
Apr 29, 2020
a181a9f
package private
Apr 29, 2020
51ed3cd
avoid exception for control flow
Apr 29, 2020
e480663
instrument
Apr 29, 2020
8e88954
simplicity
Apr 29, 2020
f955114
Merge remote-tracking branch 'origin/develop' into fo/server-side-nod…
Apr 29, 2020
ae4bf3b
cleanup
Apr 29, 2020
62c5ede
Add generated changelog entries
Apr 29, 2020
d9fbd21
README for server side node selection (#697)
iamdanfox Apr 30, 2020
32862c8
Merge remote-tracking branch 'origin/develop' into fo/server-side-nod…
iamdanfox Apr 30, 2020
dfe9ca1
Merge branch 'fo/server-side-node-selection' of ssh://github.com/pala…
iamdanfox Apr 30, 2020
d718ce3
Merge remote-tracking branch 'origin/develop' into fo/server-side-nod…
Apr 30, 2020
7a6ab2b
Merge branch 'fo/server-side-node-selection' of ssh://github.com/pala…
iamdanfox Apr 30, 2020
5435448
comment
Apr 30, 2020
1684849
chooser
Apr 30, 2020
ff4f587
metrics
Apr 30, 2020
41bacf7
Merge branch 'fo/server-side-node-selection' of github.com:palantir/d…
Apr 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,11 +113,27 @@ _This API is influenced by gRPC's [Java library](https://github.com/grpc/grpc-ja

## Behaviour

### Concurrency Limits
### Concurrency limits
Each host has an [AIMD](https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease) concurrency limit. This protects
servers by stopping requests getting out the door on the client-side. Permits are multiplicatively decreased after
receiving any 5xx, 429 or 308 response. Otherwise, they are additively increased.

### Node selection strategies
When configured with multiple uris, Dialogue has several strategies for choosing which upstream to route requests to.
The default strategy is `PIN_UNTIL_ERROR`, although users can choose alternatives such as `ROUND_ROBIN` when building a ClientConfiguration
object. Note that the choice of an appropriate strategy usually depends on the _upstream_ server's behaviour, i.e. if its
performance relies heavily on warm caches, or if successive requests must land on the same node to successfully complete
a transaction. To solve this problem without needing code changes in all clients, servers can recommend a
NodeSelectionStrategy (see below).

### Server-recommended node selection strategies
Servers can inform clients of their recommended strategies by including the
`Node-Selection-Strategy` response header. Values are separated by commas and are ordered by preference. See [available strategies](dialogue-core/src/main/java/com/palantir/dialogue/core/DialogueNodeSelectionStrategy.java).
```
Node-Selection-Strategy: BALANCED,PIN_UNTIL_ERROR
```
When the header is present, it takes precedence over user-selected strategies. Servers are free to omit this value.

### NodeSelectionStrategy.ROUND_ROBIN
Used to balance requests across many servers better than the
default PIN_UNTIL_ERROR. The actual algorithm has evolved from naive Round Robin, then to Random Selection and now
Expand Down
6 changes: 6 additions & 0 deletions changelog/@unreleased/pr-688.v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
type: feature
feature:
description: |
Clients respect the optional `Node-Selection-Strategy` response header, which takes precedence over user-selected strategies.
links:
- https://github.com/palantir/dialogue/pull/688
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ private void updateUrisInner(Collection<String> uris, boolean firstTime) {
.addAll(limitedChannelByUri.keySet())
.addAll(newUris)
.build();

newUris.forEach(uri -> {
Config configWithUris = withUris(cf, allUris); // necessary for attribute metrics to the right hostIndex
LimitedChannel singleUriChannel = createPerUriChannel(configWithUris, uri);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* (c) Copyright 2020 Palantir Technologies Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.palantir.dialogue.core;

import com.google.common.base.Splitter;
import com.google.common.collect.ImmutableList;
import com.palantir.conjure.java.client.config.NodeSelectionStrategy;
import com.palantir.logsafe.SafeArg;
import com.palantir.logsafe.exceptions.SafeIllegalStateException;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

enum DialogueNodeSelectionStrategy {
PIN_UNTIL_ERROR,
PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE,
BALANCED,
UNKNOWN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably ditch the UNKNOWN variant here, as this library is the source of truth for what strategies exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you could have an old client communicating with a new server and hence receive an unknown strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can we put a little comment saying why this is copy&pasted version of the regular remoting-api NodeSelectionStrategy? (iirc this was so that we could try out implementation details without having to promote them to a full user-selectable option?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old client communicating with a new server and hence receive an unknown strategy

in this case though we just ignore it right? i'm not sure what the unknown variant really buys us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It becomes important if you decide to have more complicated logic (what was previously implemented) for determining the node selection strategy to use. If you really don't like it, i'm happy to remove it


private static final Logger log = LoggerFactory.getLogger(DialogueNodeSelectionStrategy.class);
private static final Splitter SPLITTER = Splitter.on(",").trimResults().omitEmptyStrings();

static List<DialogueNodeSelectionStrategy> fromHeader(String header) {
return SPLITTER.splitToStream(header)
.map(DialogueNodeSelectionStrategy::safeValueOf)
.collect(ImmutableList.toImmutableList());
ferozco marked this conversation as resolved.
Show resolved Hide resolved
}

private static DialogueNodeSelectionStrategy safeValueOf(String value) {
String normalizedValue = value.toUpperCase();
if (PIN_UNTIL_ERROR.name().equals(normalizedValue)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meganit: We can avoid toUpperCase in favor of equalsIgnoreCase

return PIN_UNTIL_ERROR;
} else if (PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE.name().equals(normalizedValue)) {
return PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE;
} else if (BALANCED.name().equals(normalizedValue)) {
return BALANCED;
}

log.info("Received unknown selection strategy", SafeArg.of("strategy", value));
return UNKNOWN;
}

static DialogueNodeSelectionStrategy of(NodeSelectionStrategy strategy) {
switch (strategy) {
case PIN_UNTIL_ERROR:
return PIN_UNTIL_ERROR;
case PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE:
return PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE;
case ROUND_ROBIN:
return BALANCED;
}
throw new SafeIllegalStateException("Unknown node selection strategy", SafeArg.of("strategy", strategy));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,48 +17,62 @@
package com.palantir.dialogue.core;

import com.github.benmanes.caffeine.cache.Ticker;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.ImmutableList;
import com.google.common.util.concurrent.FutureCallback;
import com.google.common.util.concurrent.ListenableFuture;
import com.palantir.conjure.java.client.config.NodeSelectionStrategy;
import com.palantir.dialogue.Endpoint;
import com.palantir.dialogue.Request;
import com.palantir.dialogue.Response;
import com.palantir.logsafe.SafeArg;
import com.palantir.logsafe.exceptions.SafeRuntimeException;
import com.palantir.tritium.metrics.registry.TaggedMetricRegistry;
import java.util.List;
import java.util.Optional;
import java.util.Random;
import java.util.concurrent.atomic.AtomicReference;
import org.checkerframework.checker.nullness.qual.Nullable;
import org.immutables.value.Value;

@SuppressWarnings("NullAway")
ferozco marked this conversation as resolved.
Show resolved Hide resolved
final class NodeSelectionStrategyChannel implements LimitedChannel {
private final AtomicReference<LimitedChannel> nodeSelectionStrategy;
private static final String NODE_SELECTION_HEADER = "Node-Selection-Strategy";

private final FutureCallback<Response> callback = new NodeSelectionCallback();

private final AtomicReference<NodeSelectionChannel> nodeSelectionStrategy;
private final NodeSelectionStrategySelector strategySelector;

private final NodeSelectionStrategy strategy;
private final String channelName;
private final Random random;
private final Ticker tick;
private final TaggedMetricRegistry metrics;
private final LimitedChannel delegate;

private NodeSelectionStrategyChannel(
NodeSelectionStrategy strategy,
@VisibleForTesting
NodeSelectionStrategyChannel(
NodeSelectionStrategySelector strategySelector,
DialogueNodeSelectionStrategy initialStrategy,
String channelName,
Random random,
Ticker tick,
TaggedMetricRegistry metrics) {
this.strategy = strategy;
this.strategySelector = strategySelector;
this.channelName = channelName;
this.random = random;
this.tick = tick;
this.metrics = metrics;
this.nodeSelectionStrategy = new AtomicReference<>();
this.delegate = new SupplierChannel(nodeSelectionStrategy::get);
this.nodeSelectionStrategy = new AtomicReference<>(NodeSelectionChannel.builder()
.strategy(initialStrategy)
.channel(new ZeroUriNodeSelectionChannel(channelName))
.build());
this.delegate = new SupplierChannel(() -> nodeSelectionStrategy.get().channel());
}

static NodeSelectionStrategyChannel create(Config cf) {
return new NodeSelectionStrategyChannel(
cf.clientConf().nodeSelectionStrategy(),
NodeSelectionStrategyChannel::getFirstKnownStrategy,
DialogueNodeSelectionStrategy.of(cf.clientConf().nodeSelectionStrategy()),
cf.channelName(),
cf.random(),
cf.ticker(),
Expand All @@ -67,30 +81,49 @@ static NodeSelectionStrategyChannel create(Config cf) {

@Override
public Optional<ListenableFuture<Response>> maybeExecute(Endpoint endpoint, Request request) {
return delegate.maybeExecute(endpoint, request);
return delegate.maybeExecute(endpoint, request).map(this::wrapWithCallback);
}

private ListenableFuture<Response> wrapWithCallback(ListenableFuture<Response> response) {
return DialogueFutures.addDirectCallback(response, callback);
}

void updateChannels(ImmutableList<LimitedChannel> updatedChannels) {
nodeSelectionStrategy.getAndUpdate(channel -> getUpdatedNodeSelectionStrategy(
channel, updatedChannels, strategy, metrics, random, tick, channelName));
nodeSelectionStrategy.getAndUpdate(prevChannel -> getUpdatedSelectedChannel(
prevChannel.channel(), updatedChannels, prevChannel.strategy(), metrics, random, tick, channelName));
}

private void updateRequestedStrategies(List<DialogueNodeSelectionStrategy> strategies) {
Optional<DialogueNodeSelectionStrategy> maybeStrategy = strategySelector.updateAndGet(strategies);
if (maybeStrategy.isPresent()) {
DialogueNodeSelectionStrategy strategy = maybeStrategy.get();
// Quick check to avoid expensive CAS
if (strategy.equals(nodeSelectionStrategy.get().strategy())) {
return;
}
nodeSelectionStrategy.getAndUpdate(prevChannel -> getUpdatedSelectedChannel(
prevChannel.channel(), prevChannel.hostChannels(), strategy, metrics, random, tick, channelName));
}
}

private static LimitedChannel getUpdatedNodeSelectionStrategy(
private static NodeSelectionChannel getUpdatedSelectedChannel(
@Nullable LimitedChannel previousNodeSelectionStrategy,
ImmutableList<LimitedChannel> channels,
NodeSelectionStrategy updatedStrategy,
DialogueNodeSelectionStrategy updatedStrategy,
TaggedMetricRegistry metrics,
Random random,
Ticker tick,
String channelName) {

NodeSelectionChannel.Builder channelBuilder =
NodeSelectionChannel.builder().strategy(updatedStrategy).hostChannels(channels);
if (channels.isEmpty()) {
return new ZeroUriNodeSelectionChannel(channelName);
return channelBuilder
.channel(new ZeroUriNodeSelectionChannel(channelName))
.build();
}

if (channels.size() == 1) {
// no fancy node selection heuristic can save us if our one node goes down
return channels.get(0);
return channelBuilder.channel(channels.get(0)).build();
}

switch (updatedStrategy) {
Expand All @@ -101,21 +134,68 @@ private static LimitedChannel getUpdatedNodeSelectionStrategy(
if (previousNodeSelectionStrategy instanceof PinUntilErrorNodeSelectionStrategyChannel) {
PinUntilErrorNodeSelectionStrategyChannel previousPinUntilError =
(PinUntilErrorNodeSelectionStrategyChannel) previousNodeSelectionStrategy;
ferozco marked this conversation as resolved.
Show resolved Hide resolved
return PinUntilErrorNodeSelectionStrategyChannel.of(
Optional.of(previousPinUntilError.getCurrentChannel()),
updatedStrategy,
channels,
pinuntilerrorMetrics,
random,
channelName);
return channelBuilder
.channel(PinUntilErrorNodeSelectionStrategyChannel.of(
Optional.of(previousPinUntilError.getCurrentChannel()),
updatedStrategy,
channels,
pinuntilerrorMetrics,
random,
channelName))
.build();
}
return PinUntilErrorNodeSelectionStrategyChannel.of(
Optional.empty(), updatedStrategy, channels, pinuntilerrorMetrics, random, channelName);
case ROUND_ROBIN:
return channelBuilder
.channel(PinUntilErrorNodeSelectionStrategyChannel.of(
Optional.empty(), updatedStrategy, channels, pinuntilerrorMetrics, random, channelName))
.build();
case BALANCED:
// When people ask for 'ROUND_ROBIN', they usually just want something to load balance better.
// We used to have a naive RoundRobinChannel, then tried RandomSelection and now use this heuristic:
return new BalancedNodeSelectionStrategyChannel(channels, random, tick, metrics, channelName);
return channelBuilder
.channel(new BalancedNodeSelectionStrategyChannel(channels, random, tick, metrics, channelName))
.build();
case UNKNOWN:
}
throw new SafeRuntimeException("Unknown NodeSelectionStrategy", SafeArg.of("unknown", updatedStrategy));
ferozco marked this conversation as resolved.
Show resolved Hide resolved
}

@VisibleForTesting
static Optional<DialogueNodeSelectionStrategy> getFirstKnownStrategy(
List<DialogueNodeSelectionStrategy> strategies) {
for (DialogueNodeSelectionStrategy strategy : strategies) {
if (!strategy.equals(DialogueNodeSelectionStrategy.UNKNOWN)) {
return Optional.of(strategy);
}
}
return Optional.empty();
}

@Value.Immutable
interface NodeSelectionChannel {
DialogueNodeSelectionStrategy strategy();

LimitedChannel channel();

ImmutableList<LimitedChannel> hostChannels();
Copy link
Contributor

@iamdanfox iamdanfox Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are hostChannels saved here rather than just using an instance field? ah nvm it's live reloading again isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya we need to keep track of the host channels so that we can recreate the underlying channel in response to a server's node-selection-strategy response


class Builder extends ImmutableNodeSelectionChannel.Builder {}

static Builder builder() {
return new Builder();
}
}

private final class NodeSelectionCallback implements FutureCallback<Response> {
@Override
public void onSuccess(Response result) {
result.getFirstHeader(NODE_SELECTION_HEADER).ifPresent(this::consumeStrategy);
}

@Override
public void onFailure(Throwable _unused) {}

private void consumeStrategy(String strategy) {
updateRequestedStrategies(DialogueNodeSelectionStrategy.fromHeader(strategy));
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* (c) Copyright 2020 Palantir Technologies Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.palantir.dialogue.core;

import java.util.List;
import java.util.Optional;

interface NodeSelectionStrategySelector {
ferozco marked this conversation as resolved.
Show resolved Hide resolved
Optional<DialogueNodeSelectionStrategy> updateAndGet(List<DialogueNodeSelectionStrategy> updatedStrategies);
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import com.google.common.collect.ImmutableList;
import com.google.common.util.concurrent.FutureCallback;
import com.google.common.util.concurrent.ListenableFuture;
import com.palantir.conjure.java.client.config.NodeSelectionStrategy;
import com.palantir.dialogue.Endpoint;
import com.palantir.dialogue.Request;
import com.palantir.dialogue.Response;
Expand Down Expand Up @@ -86,7 +85,7 @@ final class PinUntilErrorNodeSelectionStrategyChannel implements LimitedChannel

static PinUntilErrorNodeSelectionStrategyChannel of(
Optional<LimitedChannel> initialChannel,
NodeSelectionStrategy strategy,
DialogueNodeSelectionStrategy strategy,
List<LimitedChannel> channels,
DialoguePinuntilerrorMetrics metrics,
Random random,
Expand All @@ -111,15 +110,12 @@ static PinUntilErrorNodeSelectionStrategyChannel of(
.map(limitedChannel -> Math.max(0, initialShuffle.indexOf(limitedChannel)))
.orElse(0);

switch (strategy) {
case PIN_UNTIL_ERROR:
NodeList shuffling =
ReshufflingNodeList.of(initialShuffle, random, System::nanoTime, metrics, channelName);
return new PinUntilErrorNodeSelectionStrategyChannel(shuffling, initialPin, metrics, channelName);
case PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE:
NodeList constant = new ConstantNodeList(initialShuffle);
return new PinUntilErrorNodeSelectionStrategyChannel(constant, initialPin, metrics, channelName);
case ROUND_ROBIN:
if (strategy == DialogueNodeSelectionStrategy.PIN_UNTIL_ERROR) {
NodeList shuffling = ReshufflingNodeList.of(initialShuffle, random, System::nanoTime, metrics, channelName);
return new PinUntilErrorNodeSelectionStrategyChannel(shuffling, initialPin, metrics, channelName);
} else if (strategy == DialogueNodeSelectionStrategy.PIN_UNTIL_ERROR_WITHOUT_RESHUFFLE) {
NodeList constant = new ConstantNodeList(initialShuffle);
return new PinUntilErrorNodeSelectionStrategyChannel(constant, initialPin, metrics, channelName);
}

throw new SafeIllegalArgumentException("Unsupported NodeSelectionStrategy", SafeArg.of("strategy", strategy));
Expand Down
Loading