-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement additional QoS metadata #1231
Conversation
Generate changelog in
|
errors/src/main/java/com/palantir/conjure/java/api/errors/QosReason.java
Outdated
Show resolved
Hide resolved
return new QosReason( | ||
Preconditions.checkNotNull(reason, "reason"), | ||
Optional.ofNullable(retryHint), | ||
Optional.ofNullable(dueTo)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems the presence of retryHint
and dueTo
imply some semantic meaning for the caller such that both must be set, and in fact QosReasons#parseFromResponse requires both to be set in order to parse either - but it's not clear from this API that you must set both. Should make that a requirement in the builder?
EDIT: ah whoops, parseFromResponse only requires one to be set, but I guess my question is the same - are there scenarios where one would be set, but not the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I have a table in the design document where the combinations are described with internal examples
|
||
private static final String CLIENT_REASON = "client-qos-response"; | ||
private static final QosReason DEFAULT_CLIENT_REASON = QosReason.of(CLIENT_REASON); | ||
private static final String DUE_TO_HEADER = "Qos-Due-To"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want these to be all lowercase?
EDIT: disregard, just realized these are the header names, not the values
private static final String RETRY_HINT_HEADER = "Qos-Retry-Hint"; | ||
|
||
private static final String DUE_TO_CUSTOM_HEADER_VALUE = "custom"; | ||
private static final String RETRY_HINT_DO_NOT_RETRY_HEADER_VALUE = "do-not-retry"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: i'm just curious why these are not encoded as e.g. a toString
on the enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the separation of concerns with enum definitions in QosReason.java
and the encoding pieces in the utility helper. Either way would be reasonable, but now that we have enum switch expressions, we can do a typesafe mapping with compile-time checks that we've associated a string with each enum value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
[skip ci]
For more information, see palantir/conjure-java-runtime-api#1231
/** | ||
* Conveys the servers opinion on whether a QoS failure should be retried, or propagate | ||
* back to the caller and result in a failure. There is no guarantee that these values | ||
* will be respected by all clients, and should be considered best-effort. | ||
*/ | ||
public enum RetryHint { | ||
/** | ||
* Clients should not attempt to retry this failure, | ||
* providing the failure as context back to the initial caller. | ||
*/ | ||
DO_NOT_RETRY; | ||
} | ||
|
||
/** | ||
* Describes the cause of a QoS failure when known to be non-default. By default, we assume that | ||
* a 503 Unavailable is the result of a node-wide limit being reached, and that a 429 is specific | ||
* to an individual endpoint on a node. These assumptions do not hold true in all cases, so | ||
* {@link DueTo} informs relays this intent. | ||
*/ | ||
public enum DueTo { | ||
/** | ||
* A cause that the RPC system isn't directly aware of, for example a user or user-agent specific limit, or | ||
* based on a specific resource that's being accessed, as opposed to the target node as a whole, or endpoint. | ||
* QosReasons with this cause shouldn't impact things like the dialogue concurrency limiter. | ||
*/ | ||
CUSTOM; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use conjure-style enums here, with an UNKNOWN
variant that can pass through new unknown values. This way, future additions will be passed through the system even before all services have taken the latest version of cjr-api. Additionally, it allows folks to potentially define more descriptive DueTo reasons (though it may require some care to make that safe, as it may impact dialogue and friends)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i'm struggling to understand the difference between UNKNOWN
and CUSTOM
. Is the concern that a client may receive a DueTo
value from a newer version of c-j-r that they don't know how to deserialize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i'm struggling to understand the difference between
UNKNOWN
andCUSTOM
.
I'm not entirely sure that it makes sense to declare both, in practice I think they could be considered equivalent by most consumers. I do prefer having well-defined values like CUSTOM
for the common case, rather than using arbitrary string wrappers.
An open question I have is whether it would make sense to allow applications/libraries to define their own specific DueTo
values using this mechanism -- such values would have to be treated the same as CUSTOM
by Dialogue, but could provide better observability info.
Is the concern that a client may receive a DueTo value from a newer version of c-j-r that they don't know how to deserialize?
Yep, the current implementation will ignore the value as though it doesn't exist.
public Optional<DueTo> dueTo() { | ||
return dueTo; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should become a Set<DueTo> dueTo()
to support multiple reasons some day, keeps the door open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, a downside of allowing multiple DueTo
values is that adding it to metric tags would become unwieldy.
👍 lgtm |
Released 2.54.0 |
For more information, see palantir/conjure-java-runtime-api#1231
This adds two new fields to QosException, which may produce following response headers:
Qos-Due-To: custom
: Describes this QoS failure as the result of an aspect of the request that doesn't necessarily apply to all requests, and shouldn't impact routing behavior. For example, per-user rate limits are specific to a single user, not all users.Qos-Retry-Hint: do-not-retry
: Hints that callers should not retry the request which resulted in this failure.Before this PR
No way to provide additional context with QoS exceptions.
After this PR
==COMMIT_MSG==
Implement additional QoS metadata
==COMMIT_MSG==
This PR provides two pieces:
RetryHint
andDueTo
QosReasons
utility which handles recording this metadata to a response, and recording from a response. This allows us to make changes/additions over time without updating projects that use these utilities.Possible downsides?
This adds complexity to our QoS exceptions, which are already somewhat complicated to reason about. We need additional control over how QoS responses impact the system as a whole, so this should be a reasonable trade-off.