-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance clarifications for ServiceDiscoverer
contract
#3002
Conversation
Motivation: Some assumptions that are currently in place are not clarified in documentation for new implementations. Modifications: - Clarify that the returned publisher should never complete and must support re-subscribes to handle failures and cancellations. - Clarify motivation for always starting with the "state of the world" for interpreting `ServiceDiscovererEvent`(s). - Add recommendation to implement a `ServiceDiscoverer` using a single thread. Result: More details for expected behavior of a `ServiceDiscoverer` implementation.
Motivation: Currently, both implementations of the `LoadBalancer` assume 2 possibilities for the first events collection after re-subscribe: "state of the world" or "delta" and tries to infer which one it is based on what event statuses are present in the collection. However, it doesn't provide any guarantees because even if SD impl keeps returning deltas after a re-subscribe, a delta can contain only `AVAILABLE` events, but it will be incorrectly interpreted as "state of the world". Because of Reactive Streams rule clarified in apple#3002, `LoadBalancer` implementations should always expect "state of the world" for the first events after re-subscribe. Modifications: - Remove inference of the first collection of events after resubscribe and always expect a new subscriber to start from "state of the world". - Update `resubscribeToEventsWhenAllHostsAreUnhealthy()` test to reflect behavior change. Result: Both `LoadBalancer` implementations follow `ServiceDiscoverer` contract and expect the first collection of events after re-subscribe to represent a "state of the world".
Motivation: Clients have a configurable `serviceDiscovererRetryStrategy` to guarantee a steady stream of events to the `LoadBalancer` that never fails. It's necessary at the client level to avoid hanging requests indefinitely and let requests observe failures from ServiceDiscoverer. Also, for `PartitionedHttpClient` it's necessary to guarantee that `GroupedPublisher` never fails. Retry is effectively a re-subscribe. According to `ServiceDiscoverer` contract (clarified in apple#3002), each `Subscriber` receives a "state of the world" as the first collection of events. The problem is that the state may change significantly between retries, as a result unavailable addresses can remain inside the `LoadBalancer` forever. Example: T1. SD delivers [a,b] T1. LB receives [a,b] T1. SD delivers error T2. SD info changed ("a" got revoked) T3. Client retries SD T3. SD delivers [b] T3. LB receives [b] (but still holds "a") When we retry `ServiceDiscoverer` errors, we should keep pushing deltas downstream or purge events that are not present in the new "state of the world". We previously had this protection but it was mistakenly removed in apple#1949 as part of a broader refactoring around `ServiceDiscoverer` <-> `LoadBalancer` contract. Modifications: - Add `RetryingServiceDiscoverer` that handles retries and keeps the state between retries. - Use it in `DefaultSingleAddressHttpClientBuilder` and `DefaultPartitionedHttpClientBuilder`. - Use `CastedServiceDiscoverer` to allow modifications for `ServiceDiscovererEvent` after we started to use a wildcard type in apple#2379. - Pass consistent `targetResource` identifier to both `RetryingServiceDiscoverer` and `LoadBalancerFactory` to allow state correlation when inspecting heap dump. Result: Client keeps pushing deltas to `LoadBalancer` after retrying `ServiceDiscoverer` errors, keeping its state consistent with `ServiceDiscoverer`.
servicetalk-client-api/src/main/java/io/servicetalk/client/api/ServiceDiscoverer.java
Show resolved
Hide resolved
* Because typically a {@link ServiceDiscoverer} implementation runs in the background and doesn't require many compute | ||
* resources, it's recommended (but not required) to run it and deliver updates on a single thread either for all | ||
* discoveries or at least for all {@link Subscriber Subscribers} to the same {@link Publisher}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, it's not clear what this buys the implementors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to put it as a requirement, it's just a nice-to-have recommendation bcz most ppl don't think about it when they implement.
For RRLB it will help with this case: #3005 (comment)
But we are moving away from RRLB anyway.
In general, it makes debugging easier bcz you can track all events from a single thread and logs will represent the exact serial behavior. If they use multiple threads, logs may mess up order of events when LB cancels and resubscribes.
Do you have a recommendation how to rephrase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My best suggestion is basically what you said in the reply. 🙂 I added as a suggestion but I'm also not strongly opinionated about it. If you prefer to keep it short and not try to sell it then that's perfectly fine with me.
servicetalk-client-api/src/main/java/io/servicetalk/client/api/ServiceDiscoverer.java
Outdated
Show resolved
Hide resolved
…#3004) Motivation: Currently, both implementations of the `LoadBalancer` assume 2 possibilities for the first events collection after re-subscribe: "state of the world" or "delta" and try to infer which one it is based on what event statuses are present in the collection. However, it doesn't provide any guarantees because even if SD impl keeps returning deltas after a re-subscribe, a delta can contain only `AVAILABLE` events, but it will be incorrectly interpreted as "state of the world". Because of Reactive Streams rule clarified in #3002, `LoadBalancer` implementations should always expect "state of the world" for the first events after re-subscribe. Modifications: - Remove inference of the first collection of events after re-subscribe and always expect a new subscriber to start from "state of the world". - Update `resubscribeToEventsWhenAllHostsAreUnhealthy()` test to reflect behavior change. Result: Both `LoadBalancer` implementations follow `ServiceDiscoverer` contract and expect the first collection of events after re-subscribe to represent a "state of the world".
…3006) Motivation: Clients have a configurable `serviceDiscovererRetryStrategy` to guarantee a steady stream of events to the `LoadBalancer` that never fails. It's necessary at the client level to avoid hanging requests indefinitely and let requests observe failures from ServiceDiscoverer. Also, for `PartitionedHttpClient` it's necessary to guarantee that `GroupedPublisher` never fails. Retry is effectively a re-subscribe. According to `ServiceDiscoverer` contract (clarified in #3002), each `Subscriber` receives a "state of the world" as the first collection of events. The problem is that the state may change significantly between retries. As a result, unavailable addresses can remain inside the `LoadBalancer` forever. Example: T1. SD delivers [a,b] T1. LB receives [a,b] T1. SD delivers error T2. SD info changed ("a" got revoked) T3. Client retries SD T3. SD delivers [b] T3. LB receives [b] (but still holds "a") When we retry `ServiceDiscoverer` errors, we should keep pushing deltas downstream or purge events that are not present in the new "state of the world". We previously had this protection but it was mistakenly removed in #1949 as part of a broader refactoring around `ServiceDiscoverer` <-> `LoadBalancer` contract. Modifications: - Add `RetryingServiceDiscoverer` that handles retries and keeps the state between retries. - Use it in `DefaultSingleAddressHttpClientBuilder` and `DefaultPartitionedHttpClientBuilder`. - Use `CastedServiceDiscoverer` to allow modifications for `ServiceDiscovererEvent` after we started to use a wildcard type in #2379. - Pass consistent `targetResource` identifier to both `RetryingServiceDiscoverer` and `LoadBalancerFactory` to allow state correlation when inspecting heap dump. Result: Client keeps pushing deltas to `LoadBalancer` after retrying `ServiceDiscoverer` errors, keeping its state consistent with `ServiceDiscoverer`.
Motivation:
Some assumptions that are currently in place are not clarified in documentation for new implementations.
Modifications:
ServiceDiscovererEvent
(s).ServiceDiscoverer
using a single thread.Result:
More details for expected behavior of a
ServiceDiscoverer
implementation.