-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolver: update ReportError() docstring #7732
resolver: update ReportError() docstring #7732
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7732 +/- ##
==========================================
- Coverage 81.96% 80.65% -1.32%
==========================================
Files 362 365 +3
Lines 28119 28635 +516
==========================================
+ Hits 23048 23095 +47
- Misses 3863 4345 +482
+ Partials 1208 1195 -13
|
resolver/resolver.go
Outdated
// error. The Resolver itself will handle retrying the resolution, possibly | ||
// using an exponential backoff mechanism depending on its implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment should say "Resolver should" instead of "Resolver will" because the resolver is to be implemented by users and not grpc itself. The comment should also NOT mention what kind of backoff to use (exponential) unless it's specified in grpc specifications. The comment from the above method can be reused here:
The resolver should try to resolve the target again. The resolver should use a backoff timer to prevent overloading the server with requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
resolver/resolver.go
Outdated
// ReportError notifies the ClientConn that the Resolver encountered an | ||
// error. The ClientConn will notify the load balancer and begin calling | ||
// ResolveNow on the Resolver with exponential backoff. | ||
// error. The Resolver should handle retrying the resolution, using a | ||
// backoff timer to prevent overloading the server with requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should limit this docstring to what ReportError
is all about.
ReportError
is a way for the name resolver implementations to report an error to gRPC.- What gRPC does with that (i.e. forwarding it to the LB policy) is maybe worth mentioning here.
- But what the LB policies do with that (i.e. most probably call
ResolveNow
) is not worth mentioning here I feel. - And how the resolver implementations should handle calls to
ResolveNow
(i.e. use backoff etc) is definitely not worth mentioning here. They would be more appropriate in the docstring forResolveNow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, just keep it to only this ReportError notifies the ClientConn that the Resolver encountered an error.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it again, I think it is worth mentioning that the error is forwarded to the LB policy by the ClientConn. But we can stop there. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like:
ReportError notifies the ClientConn that the Resolver encountered an
error. The ClientConn then forwards this error to the load balancing policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
If you are implementing your own resolver, its recommended to implement your resolver which retries with some back off
-
As soon as error occurs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should resolvers handle an error from ReportError?
Resolver implementations are the ones who are reporting errors by calling ReportError
on the ClientConn. Once this error is forwarded to the LB policies, they may call ResolveNow
. There are two type of name resolvers that we have encountered so far:
- Polling based: DNS name resolver is an example of this. For such resolvers, it is recommended to have either a minimum polling interval and/or backoff before retrying the name resolution, when
ResolveNow
is called. You can look at our DNS resolver implementation for more details. - Push based: xDS name resolver is an example of this. In these types of name resolvers, the resolution is pushed from a server. So,
ResolveNow
for these resolvers is a no-op.
When should resolvers report an error with ReportError?
As soon as the error occurs, unless there is good reason for the resolver to believe otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to both of you for your help!
I'm wondering why ResolveNow
exists when resolvers can not rely on it and always have to work without it.
When should resolvers report an error with ReportError?
As soon as the error occurs, unless there is good reason for the resolver to believe otherwise.
Is it an error if a name is unknown at the resolver, resolves to 0 addresses?
Or should the resolver return an empty set of addresses?
The latter seems like the correct behavior to me, because when I report an error the round_robin load-balancer continues to try to connect to the previously reported address.
If I report an empty set of addresses it stops. This is the desired behavior when the last reported instance for the name has become unavailable.
The "UpdateState" call returns a "bad resolver state" error for an empty Address slice though, which makes it seem like reporting an empty set of addresses should not be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why ResolveNow exists when resolvers can not rely on it and always have to work without it.
I don't quite follow your question. ResolveNow
exists as a way to poke the name resolver to try and re-resolve the name. This is a hint, and the name resolver can chose to ignore it when it knows that attempting to re-resolve again will not produce a different result. But if re-resolving can produce a different result, it is worth re-resolving. But the resolver implementation needs to ensure that it does not overwhelm an external name resolver server because of too many calls to ResolveNow
.
There are cases where the name resolver produces a non-empty list of addresses and the LB policy starts connecting to these addresses, but all of them fail. In this case, the LB policy will ask grpc to try and re-resolve the name.
Or should the resolver return an empty set of addresses?
Returning an empty address list is the way to go. We made a cross-language decision where we said we will "always trust the control plane". So, if the name resolver gives grpc an empty address list, the LB policy will put the client channel in TransientFailure
and RPCs will fail. But if the name resolver returns an error, the error is forwarded to the LB policy and you could have custom name resolvers and custom LB policies that are designed to handle different error conditions in different ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, that answered my open questions
Addresses: #7729
RELEASE NOTES: None