resolver: update ReportError() docstring #7732

purnesh42H · 2024-10-11T06:27:43Z

Addresses: #7729

RELEASE NOTES: None

codecov · 2024-10-11T06:32:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.65%. Comparing base (b850ea5) to head (a190b06).
Report is 9 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7732      +/-   ##
==========================================
- Coverage   81.96%   80.65%   -1.32%     
==========================================
  Files         362      365       +3     
  Lines       28119    28635     +516     
==========================================
+ Hits        23048    23095      +47     
- Misses       3863     4345     +482     
+ Partials     1208     1195      -13

Files with missing lines	Coverage Δ
resolver/resolver.go	`100.00% <ø> (ø)`

... and 21 files with indirect coverage changes

arjan-bal · 2024-10-14T06:55:03Z

resolver/resolver.go

+	// error. The Resolver itself will handle retrying the resolution, possibly
+	// using an exponential backoff mechanism depending on its implementation.


I think the comment should say "Resolver should" instead of "Resolver will" because the resolver is to be implemented by users and not grpc itself. The comment should also NOT mention what kind of backoff to use (exponential) unless it's specified in grpc specifications. The comment from the above method can be reused here:

The resolver should try to resolve the target again. The resolver should use a backoff timer to prevent overloading the server with requests.

easwars · 2024-10-14T21:56:30Z

resolver/resolver.go

 	// ReportError notifies the ClientConn that the Resolver encountered an
-	// error.  The ClientConn will notify the load balancer and begin calling
-	// ResolveNow on the Resolver with exponential backoff.
+	// error. The Resolver should handle retrying the resolution, using a
+	// backoff timer to prevent overloading the server with requests.


I think we should limit this docstring to what ReportError is all about.

ReportError is a way for the name resolver implementations to report an error to gRPC.

What gRPC does with that (i.e. forwarding it to the LB policy) is maybe worth mentioning here.

But what the LB policies do with that (i.e. most probably call ResolveNow) is not worth mentioning here I feel.

And how the resolver implementations should handle calls to ResolveNow (i.e. use backoff etc) is definitely not worth mentioning here. They would be more appropriate in the docstring for ResolveNow.

so, just keep it to only this ReportError notifies the ClientConn that the Resolver encountered an error. ?

Thinking about it again, I think it is worth mentioning that the error is forwarded to the LB policy by the ClientConn. But we can stop there. Thanks.

How about something like:

ReportError notifies the ClientConn that the Resolver encountered an error. The ClientConn then forwards this error to the load balancing policy.

If you are implementing your own resolver, its recommended to implement your resolver which retries with some back off

As soon as error occurs

How should resolvers handle an error from ReportError?

Resolver implementations are the ones who are reporting errors by calling ReportError on the ClientConn. Once this error is forwarded to the LB policies, they may call ResolveNow. There are two type of name resolvers that we have encountered so far:

Polling based: DNS name resolver is an example of this. For such resolvers, it is recommended to have either a minimum polling interval and/or backoff before retrying the name resolution, when ResolveNow is called. You can look at our DNS resolver implementation for more details.

Push based: xDS name resolver is an example of this. In these types of name resolvers, the resolution is pushed from a server. So, ResolveNow for these resolvers is a no-op.

When should resolvers report an error with ReportError?

As soon as the error occurs, unless there is good reason for the resolver to believe otherwise.

Thanks to both of you for your help!

I'm wondering why ResolveNow exists when resolvers can not rely on it and always have to work without it.

When should resolvers report an error with ReportError?

As soon as the error occurs, unless there is good reason for the resolver to believe otherwise.

Is it an error if a name is unknown at the resolver, resolves to 0 addresses?
Or should the resolver return an empty set of addresses?
The latter seems like the correct behavior to me, because when I report an error the round_robin load-balancer continues to try to connect to the previously reported address.
If I report an empty set of addresses it stops. This is the desired behavior when the last reported instance for the name has become unavailable.
The "UpdateState" call returns a "bad resolver state" error for an empty Address slice though, which makes it seem like reporting an empty set of addresses should not be done.

I'm wondering why ResolveNow exists when resolvers can not rely on it and always have to work without it.

I don't quite follow your question. ResolveNow exists as a way to poke the name resolver to try and re-resolve the name. This is a hint, and the name resolver can chose to ignore it when it knows that attempting to re-resolve again will not produce a different result. But if re-resolving can produce a different result, it is worth re-resolving. But the resolver implementation needs to ensure that it does not overwhelm an external name resolver server because of too many calls to ResolveNow.

There are cases where the name resolver produces a non-empty list of addresses and the LB policy starts connecting to these addresses, but all of them fail. In this case, the LB policy will ask grpc to try and re-resolve the name.

Or should the resolver return an empty set of addresses?

Returning an empty address list is the way to go. We made a cross-language decision where we said we will "always trust the control plane". So, if the name resolver gives grpc an empty address list, the LB policy will put the client channel in TransientFailure and RPCs will fail. But if the name resolver returns an error, the error is forwarded to the LB policy and you could have custom name resolvers and custom LB policies that are designed to handle different error conditions in different ways.

Thanks again, that answered my open questions

resolver: update ReportError() docstring

6d55519

purnesh42H added the Area: Documentation Includes examples and docs. label Oct 11, 2024

purnesh42H added this to the 1.68 Release milestone Oct 11, 2024

purnesh42H requested review from easwars and arjan-bal October 11, 2024 06:27

purnesh42H added Type: Documentation Documentation or examples and removed Area: Documentation Includes examples and docs. labels Oct 11, 2024

arjan-bal reviewed Oct 14, 2024

View reviewed changes

arjan-bal assigned purnesh42H Oct 14, 2024

Changed as per comments

71759e8

purnesh42H requested a review from arjan-bal October 14, 2024 08:56

purnesh42H assigned arjan-bal and unassigned purnesh42H Oct 14, 2024

easwars reviewed Oct 14, 2024

View reviewed changes

easwars assigned purnesh42H and unassigned arjan-bal Oct 14, 2024

purnesh42H requested a review from easwars October 15, 2024 17:56

Removed comment about retrying resolution

5f7dc3c

purnesh42H assigned easwars and unassigned purnesh42H Oct 15, 2024

easwars assigned purnesh42H and unassigned easwars Oct 15, 2024

purnesh42H modified the milestones: 1.68 Release, 1.69 Release Oct 16, 2024

add erro forwarded to the LB policy by the ClientConn

3c181be

purnesh42H assigned easwars and unassigned purnesh42H Oct 16, 2024

easwars approved these changes Oct 16, 2024

View reviewed changes

easwars assigned purnesh42H Oct 16, 2024

easwars removed their assignment Oct 16, 2024

language edit

a190b06

purnesh42H merged commit 56df169 into grpc:master Oct 17, 2024
15 checks passed

misvivek pushed a commit to misvivek/grpc-go that referenced this pull request Nov 7, 2024

resolver: update ReportError() docstring (grpc#7732)

b25527d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolver: update ReportError() docstring #7732

resolver: update ReportError() docstring #7732

purnesh42H commented Oct 11, 2024 •

edited

Loading

codecov bot commented Oct 11, 2024 •

edited

Loading

arjan-bal Oct 14, 2024

purnesh42H Oct 14, 2024

easwars Oct 14, 2024

purnesh42H Oct 15, 2024 •

edited

Loading

easwars Oct 15, 2024

purnesh42H Oct 16, 2024

easwars Oct 16, 2024

purnesh42H Oct 21, 2024 •

edited

Loading

easwars Oct 21, 2024

fho Oct 21, 2024

easwars Oct 21, 2024

fho Oct 22, 2024

		// error. The Resolver itself will handle retrying the resolution, possibly
		// using an exponential backoff mechanism depending on its implementation.

resolver: update ReportError() docstring #7732

resolver: update ReportError() docstring #7732

Conversation

purnesh42H commented Oct 11, 2024 • edited Loading

codecov bot commented Oct 11, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

purnesh42H Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

purnesh42H Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

purnesh42H commented Oct 11, 2024 •

edited

Loading

codecov bot commented Oct 11, 2024 •

edited

Loading

purnesh42H Oct 15, 2024 •

edited

Loading

purnesh42H Oct 21, 2024 •

edited

Loading