Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolver: update ReportError() docstring #7732

Merged
merged 5 commits into from
Oct 17, 2024

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Oct 11, 2024

Addresses: #7729

RELEASE NOTES: None

@purnesh42H purnesh42H added the Area: Documentation Includes examples and docs. label Oct 11, 2024
@purnesh42H purnesh42H added this to the 1.68 Release milestone Oct 11, 2024
@purnesh42H purnesh42H added Type: Documentation Documentation or examples and removed Area: Documentation Includes examples and docs. labels Oct 11, 2024
Copy link

codecov bot commented Oct 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.65%. Comparing base (b850ea5) to head (a190b06).
Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7732      +/-   ##
==========================================
- Coverage   81.96%   80.65%   -1.32%     
==========================================
  Files         362      365       +3     
  Lines       28119    28635     +516     
==========================================
+ Hits        23048    23095      +47     
- Misses       3863     4345     +482     
+ Partials     1208     1195      -13     
Files with missing lines Coverage Δ
resolver/resolver.go 100.00% <ø> (ø)

... and 21 files with indirect coverage changes

Comment on lines 240 to 241
// error. The Resolver itself will handle retrying the resolution, possibly
// using an exponential backoff mechanism depending on its implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment should say "Resolver should" instead of "Resolver will" because the resolver is to be implemented by users and not grpc itself. The comment should also NOT mention what kind of backoff to use (exponential) unless it's specified in grpc specifications. The comment from the above method can be reused here:

The resolver should try to resolve the target again. The resolver should use a backoff timer to prevent overloading the server with requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@purnesh42H purnesh42H requested a review from arjan-bal October 14, 2024 08:56
@purnesh42H purnesh42H assigned arjan-bal and unassigned purnesh42H Oct 14, 2024
Comment on lines 239 to 241
// ReportError notifies the ClientConn that the Resolver encountered an
// error. The ClientConn will notify the load balancer and begin calling
// ResolveNow on the Resolver with exponential backoff.
// error. The Resolver should handle retrying the resolution, using a
// backoff timer to prevent overloading the server with requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit this docstring to what ReportError is all about.

  • ReportError is a way for the name resolver implementations to report an error to gRPC.
  • What gRPC does with that (i.e. forwarding it to the LB policy) is maybe worth mentioning here.
  • But what the LB policies do with that (i.e. most probably call ResolveNow) is not worth mentioning here I feel.
  • And how the resolver implementations should handle calls to ResolveNow (i.e. use backoff etc) is definitely not worth mentioning here. They would be more appropriate in the docstring for ResolveNow.

Copy link
Contributor Author

@purnesh42H purnesh42H Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, just keep it to only this ReportError notifies the ClientConn that the Resolver encountered an error. ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again, I think it is worth mentioning that the error is forwarded to the LB policy by the ClientConn. But we can stop there. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

ReportError notifies the ClientConn that the Resolver encountered an
error. The ClientConn then forwards this error to the load balancing policy.

Copy link
Contributor Author

@purnesh42H purnesh42H Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If you are implementing your own resolver, its recommended to implement your resolver which retries with some back off

  2. As soon as error occurs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should resolvers handle an error from ReportError?

Resolver implementations are the ones who are reporting errors by calling ReportError on the ClientConn. Once this error is forwarded to the LB policies, they may call ResolveNow. There are two type of name resolvers that we have encountered so far:

  • Polling based: DNS name resolver is an example of this. For such resolvers, it is recommended to have either a minimum polling interval and/or backoff before retrying the name resolution, when ResolveNow is called. You can look at our DNS resolver implementation for more details.
  • Push based: xDS name resolver is an example of this. In these types of name resolvers, the resolution is pushed from a server. So, ResolveNow for these resolvers is a no-op.

When should resolvers report an error with ReportError?

As soon as the error occurs, unless there is good reason for the resolver to believe otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to both of you for your help!

I'm wondering why ResolveNow exists when resolvers can not rely on it and always have to work without it.

When should resolvers report an error with ReportError?

As soon as the error occurs, unless there is good reason for the resolver to believe otherwise.

Is it an error if a name is unknown at the resolver, resolves to 0 addresses?
Or should the resolver return an empty set of addresses?
The latter seems like the correct behavior to me, because when I report an error the round_robin load-balancer continues to try to connect to the previously reported address.
If I report an empty set of addresses it stops. This is the desired behavior when the last reported instance for the name has become unavailable.
The "UpdateState" call returns a "bad resolver state" error for an empty Address slice though, which makes it seem like reporting an empty set of addresses should not be done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why ResolveNow exists when resolvers can not rely on it and always have to work without it.

I don't quite follow your question. ResolveNow exists as a way to poke the name resolver to try and re-resolve the name. This is a hint, and the name resolver can chose to ignore it when it knows that attempting to re-resolve again will not produce a different result. But if re-resolving can produce a different result, it is worth re-resolving. But the resolver implementation needs to ensure that it does not overwhelm an external name resolver server because of too many calls to ResolveNow.

There are cases where the name resolver produces a non-empty list of addresses and the LB policy starts connecting to these addresses, but all of them fail. In this case, the LB policy will ask grpc to try and re-resolve the name.

Or should the resolver return an empty set of addresses?

Returning an empty address list is the way to go. We made a cross-language decision where we said we will "always trust the control plane". So, if the name resolver gives grpc an empty address list, the LB policy will put the client channel in TransientFailure and RPCs will fail. But if the name resolver returns an error, the error is forwarded to the LB policy and you could have custom name resolvers and custom LB policies that are designed to handle different error conditions in different ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, that answered my open questions

@easwars easwars assigned purnesh42H and unassigned arjan-bal Oct 14, 2024
@purnesh42H purnesh42H requested a review from easwars October 15, 2024 17:56
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Oct 15, 2024
@easwars easwars assigned purnesh42H and unassigned easwars Oct 15, 2024
@purnesh42H purnesh42H modified the milestones: 1.68 Release, 1.69 Release Oct 16, 2024
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Oct 16, 2024
@easwars easwars removed their assignment Oct 16, 2024
@purnesh42H purnesh42H merged commit 56df169 into grpc:master Oct 17, 2024
15 checks passed
misvivek pushed a commit to misvivek/grpc-go that referenced this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Documentation or examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants