-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ErrNoSubConnAvailable reason to pick ctx timeout #7143
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right way to do this.
ErrNoSubConnAvailable
is not a meaningful error to a user.Instead, we should probably change line 120's error to something like
fmt.Sprintf("received context error while waiting for new LB policy update: %v", ctx.Err())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.Err() could either indicate an issue with the implementation of a certain load balancer, or it might be caused by the absence of a corresponding server. However, balancer.ErrNoSubConnAvailable might give the client a clearer understanding of the reason for the error. Often, when clients encounter timeouts, they may assume there are issues with the network connection.
I believe that it would be more user-friendly for the client if the specific reason for the timeout were included along with the ctx.Err().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.Err means the RPC was timed out or canceled - we only know which by calling it.
If the RPC queues while it's queued waiting for an updated picker from the LB policy, that means the LB policy should still be attempting connections. If connections have failed, then it should return those connection errors instead.
What LB policy have you configured? Are you making your own? You may be misunderstanding how they are supposed to behave.
ErrNoSubConnAvailable
is not a reason for a timeout. The reason for the timeout is that the LB policy isn't producing a picker that works for the RPC. That only should happen because it's still attempting a connection for that address. Once that address succeeds, a new picker will be produced that will work for that RPC. Or if it fails, a new picker will be produced that returns an appropriate error for the RPC. I think the proposal in my previous comment should help users understand where in the system the RPC timed out, and I don't think there's much more we can do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your response, I hope we can have further discussions.
In my current scenario, during the gray release process, the methods corresponding to the new and old versions of the service are different. In the load balancer, I retrieve the service with the new method by using the fullMethodName. When there are no new version services online, I return an ErrNoSubConnAvailable error, which clearly indicates that there is indeed no effective connection available for the method the client is trying to access. However, the client only receives a timeout error and is not aware of the actual reason for the timeout.
I could indeed use a custom error to convey this scenario, but I feel that the semantics of the ErrNoSubConnAvailable error match my current situation well, and since it is a built-in error of the balancer, the client could also use this error to identify the type of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. You really should be returning an error in this case. Also, things generally don't work well if you have multiple versions of a service deployed. All clients should use the old version of the service until you know all servers have been upgraded, and only then should you update your clients to use the new version. Otherwise you will run into trouble with things like this.
ErrNoSubConnAvailable
is not intended for an end user to see. It is just part of the API between the LB policy and the ClientConn.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I got it. However, I think it is still necessary to clarify to the client the reason for the ctx.Error, such as a timeout in updating the client-side load balancer. This is user-friendly for those who are new to gRPC.
New PR for this. #7206