Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDKs to support retry and reconnection to sidecar #6609

Open
artursouza opened this issue Jun 29, 2023 · 4 comments · Fixed by dapr/java-sdk#889
Open

SDKs to support retry and reconnection to sidecar #6609

artursouza opened this issue Jun 29, 2023 · 4 comments · Fixed by dapr/java-sdk#889

Comments

@artursouza
Copy link
Member

artursouza commented Jun 29, 2023

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

Describe the feature

  1. SDKs should automatically reestablish connection to sidecar endpoint after sidecar restarts (if connection is sticky).
  2. SDKs should automatically retry network-based exceptions up to N times. Configuration should be via environment variable: DAPR_NETWORK_MAX_RETRIES with default value of 3 (zero means disabled).
  3. Each SDK may expose a parameter (equivalent to DAPR_NETWORK_MAX_RETRIES) to allow programmatically overwriting of the config when instantiating a client object.

Proposal to follow after discussion here. In scope for 1.12

Release Note

RELEASE NOTE:

@artursouza
Copy link
Member Author

artursouza commented Jul 14, 2023

I have implemented a (tentative) reference implementation in the Java SDK. Users would have 2 environment variables to configure: DAPR_API_TIMEOUT_SECONDS and DAPR_API_MAX_RETRIES. I am not using the "network" term to not limit the list of retriable exceptions. The draft implementation has the following list of retriable exceptions based on gRPC code: ABORTED, CANCELLED, DEADLINE_EXCEEDED, UNAVAILABLE. The proposal would need to mention protocol specific errors for HTTP as well. I used gRPC only because the Java SDK has deprecated the HTTP implementation and only one will be offered in the near future.

dapr/java-sdk#889

I have tested this by using ToxiProxy with a jitter of 30 seconds, SDK configured with timeout of 10 seconds and 3 max retries. There is a significant different of how many of the calls to state store succeed after the policy is applied vs as-is.

I also noticed that without the timeout config, the client will hang "forever" - so, these two settings go hand-in-hand IMO and should be applicable to both gRPC and HTTP protocols.

@artursouza
Copy link
Member Author

I also bumped into a documentation for retries in the .Net gRPC client: https://learn.microsoft.com/en-us/aspnet/core/grpc/retries?view=aspnetcore-7.0

@artursouza
Copy link
Member Author

Work continues in 1.13, so far Java SDK is the only one that supports resiliency in 1.12.

@cicoyle
Copy link
Contributor

cicoyle commented Jul 25, 2024

SDKs that support this feature:

  • Java
  • Python
  • .Net
  • Go
  • JavaScript
  • Rust

We should strive to support this feature in all SDKs before closing this issue

@cicoyle cicoyle modified the milestones: v1.14, v1.15 Jul 25, 2024
@yaron2 yaron2 modified the milestones: v1.15, v1.16 Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants