-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: new Transport interface and LRS stream implementation #7717
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7717 +/- ##
==========================================
- Coverage 82.05% 81.64% -0.42%
==========================================
Files 362 364 +2
Lines 28111 28312 +201
==========================================
+ Hits 23067 23114 +47
- Misses 3845 4014 +169
+ Partials 1199 1184 -15
|
3118c9e
to
8e32c21
Compare
xds/internal/xdsclient/transport/grpctransport/grpctransport_ext_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some minor nits.
xds/internal/xdsclient/transport/grpctransport/grpctransport.go
Outdated
Show resolved
Hide resolved
grpctest.RunSubTests(t, s{}) | ||
} | ||
|
||
// Tests that the grpctransport.Builder creates a new grpc.ClientConn every time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get the point of this test. You're hooking into new client to make sure it gets called, and then getting rid of it and then you make a transport and then immediately close it. To me it verifies only two things:
- The function internal.GRPCNewClient is called from transport creation.
- You successfully overwrote internal.GRPCNewClient in the first Dial, and the second one did not hit your overwritten function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the test to not reset the new client hook after the first call. This changes the logic of the test such that it verifies that everytime Build
is called, then the new client hook is called.
Also, I have to call grpc.NewClient
from the overridden function, because I need to return a *grpc.ClientConn
. If I simply set customDialerCalled
to true
and return a nil
for the first return value, that would lead to a panic, since the code actually calls cc.Connect
once grpc.NewClient
returns a non-nil error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I guess it would nil panic if not successfully connected, so it's testing that too. So it's testing Build/NewClient come 1:1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, why would it panic if it is not successfully connected? In fact, I don't even have a real management server running as part of this test. What I meant was if I passed nil
for the first return value in the overridden client hook, that would cause the code to panic because it calls cc.Connect
on it.
So it's testing Build/NewClient come 1:1?
Sort of. But it does not test that NewClient
actually ends up establishing a connection, because that would mean that we are testing grpc.NewClient
instead of this Build
function.
xds/internal/xdsclient/transport/grpctransport/grpctransport_ext_test.go
Outdated
Show resolved
Hide resolved
xds/internal/xdsclient/transport/grpctransport/grpctransport_ext_test.go
Outdated
Show resolved
Hide resolved
rInterval := resp.GetLoadReportingInterval() | ||
if rInterval.CheckValid() != nil { | ||
return nil, 0, fmt.Errorf("lrs: invalid load_reporting_interval: %v", err) | ||
} | ||
interval := rInterval.AsDuration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always wonder what to do here when I get a variable from a methodA(), and then typecast it to data I'll use eventually which is semantically the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment. Do you want me to get rid of the local interval
and inline rInterval.AsDuration()
in the return statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just don't know what the best practices in Go here are, since I this come up to. data := getData, data2 := getData.(Data) and then use data2 the rest of function, I don't know what to call data/data2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I don't think there is any specific guidance around this in the style guide, or at least, I haven't seen it before.
Personally, if I'm doing
data := getData()
data2, ok := data.(Data)
I would make the variable name for data
as small as possible, since its scope is only a couple of lines. And I would make the variable name for data2
to be more meaningful, since that would probably have a bigger scope. I had left the names as they were in the previous code, but changed it now to be more descriptive for the second one. I couldn't use int
for interval since that is a reserved name, and I didn't want to use i
for interval since that is usually reserved for indices.
|
||
// recvFirstLoadStatsResponse receives the first LoadStatsResponse from the LRS | ||
// server. Returns the following: | ||
// - a list of cluster names requested by the server or an empty slice if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I feel like empty slice is distinct from nil, which is currently being returned. len(nil slice) returns 0 but I think empty slice is []type{}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the method on the load.Store
to which the clusters returned from here is passed handles empty slices correctly by checking for len() == 0
instead of checking for nil
.
See:
func (s *Store) Stats(clusterNames []string) []*Data { |
But I think it also makes sense for me to return an empty slice here when the server requests for load from all clusters instead of returning a nil
slice, because it is semantically different from returning a nil
slice for other error conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just meant that in the case where you want all clusters, it states it was returning an empty slice but it was returning nil instead (which happened to also be what was being returned in error cases).
|
||
if lrs.refCount != 0 { | ||
lrs.refCount++ | ||
return lrs.lrsStore, cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if understand this correctly
- Multiple grpc clients can report load on the same load store through xds client
- Only the first ReportLoad() call create the stream for LRS and all report stats go through that irrespective of how many grpc clients are reporting
- Each grpc client, when they are done reporting, calls cleanup which decrement the refCount
- Only the last grpc client, when its done reporting, calls cleanup and lrs stream is destroyed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple grpc clients can report load on the same load store through xds client
A single xds client is shared across grpc channels (with the same target URI) and grpc servers. Load reporting is a client-side feature, so let's forget about servers for now. Load is reported currently by the clusterimpl
LB policy, which is a per-cluster LB policy. Load reports for all clusters within a single grpc client go through the same xDS client. They all share the same load store, which supports recording loads for multiple clusters.
Only the first ReportLoad() call create the stream for LRS and all report stats go through that irrespective of how many grpc clients are reporting
More specifically, the first call to ReportLoad
that causes the ref count to become 1
.
Each grpc client, when they are done reporting, calls cleanup which decrement the refCount
Again, this is not per grpc client. This is per clusterimpl
policy (or whichever entity is responsible for reporting load)
Only the last grpc client, when its done reporting, calls cleanup and lrs stream is destroyed
The call to cleanup that causes the ref count to go to 0
will result in the underlying stream being cleaned up.
func (lrs *StreamImpl) runner(ctx context.Context) { | ||
defer close(lrs.doneCh) | ||
|
||
// This feature indicates that the client supports the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does supports_send_all_clusters
means the client should report load statistics for all clusters it's aware of, even if they weren't explicitly requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a client feature, i.e. something that the client supports. See: https://www.envoyproxy.io/docs/envoy/latest/api/client_features.
// - any error encountered | ||
// | ||
// If the server requests for endpoint-level load reporting, an error is | ||
// returned, since this is not yet supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the meaning of endpoint-level load reporting not being supported? Does it mean LoadStat doesn't doesn't support that yet? Is that something we will have to support in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of this, since the other languages don't support this and we don't have any plans of supporting this at this point as a cross-language feature.
bba916f
to
434d43b
Compare
#a71-xds-fallback
#xdsclient-refactor
The existing structure of the xDS client is as follows:
authority
type for each authority configuration in the bootstrap (ignoring authority sharing)authority
has aTransport
which contains agrpc.ClientConn
to the xDS management serverTransport
type provides the following functionalityDiscoveryRequest
to be sentThe new structure for the xDS client will be as follows:
authority
type for each authority configuration in the bootstrap (even if the authority configuration are the same or have the same server configuration)xdsChannel
s, one each for each server configuration specified in the bootstrapauthority
will acquire references to one of morexdsChannel
instancesxdsChannel
will contain the followingTransport
to the xDS management server. This will be an interface allowing for non-grpc transports to be used.This PR introduces the following functionality:
Transport
interface and provides a gRPC transport implementation.The current LRS implementation can be found in https://github.com/grpc/grpc-go/blob/master/xds/internal/xdsclient/transport/loadreport.go, and this PR's implementation is heavily based off of it.
Subsequent PRs will add more functionatlity.
Addresses #6902
RELEASE NOTES: none