Skip to content

xds: generic lrs client for load reporting #8250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 5, 2025

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Apr 15, 2025

This is the change to make generic LRS client for load reporting to LRS server.

The PR copies the existing

  • xds/internal/xdsclient/load/store.go,
  • xds/internal/xdsclient/transport/lrs/lrs_stream.go,
  • xds/internal/xdsclient/load/store_test.go
  • xds/internal/xdsclient/tests/loadreport_test.go

from internal xdsclient code and then modify them to use the generic client types and interfaces. Each "copy" commit is followed by the "modify" commit for that file. Reviewers can start from reviewing the "modify" commit.

PS: Currently loadreport_test.go has compilation error as so its commented out as it is depends on some of the functions added in #8183

RELEASE NOTES: None

@purnesh42H purnesh42H added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Apr 15, 2025
@purnesh42H purnesh42H added this to the 1.73 Release milestone Apr 15, 2025
Copy link

codecov bot commented Apr 15, 2025

Codecov Report

Attention: Patch coverage is 77.00422% with 109 lines in your changes missing coverage. Please review.

Project coverage is 82.28%. Comparing base (82e25c7) to head (502c7ec).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/clients/lrsclient/lrs_stream.go 63.35% 52 Missing and 18 partials ⚠️
xds/internal/clients/lrsclient/lrsclient.go 72.63% 19 Missing and 7 partials ⚠️
xds/internal/clients/lrsclient/load_store.go 92.93% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8250      +/-   ##
==========================================
+ Coverage   82.15%   82.28%   +0.13%     
==========================================
  Files         412      419       +7     
  Lines       40562    41847    +1285     
==========================================
+ Hits        33322    34433    +1111     
- Misses       5875     5962      +87     
- Partials     1365     1452      +87     
Files with missing lines Coverage Δ
xds/internal/clients/lrsclient/logging.go 100.00% <100.00%> (ø)
xds/internal/clients/lrsclient/load_store.go 93.08% <92.93%> (+93.08%) ⬆️
xds/internal/clients/lrsclient/lrsclient.go 72.63% <72.63%> (+72.63%) ⬆️
xds/internal/clients/lrsclient/lrs_stream.go 63.35% <63.35%> (ø)

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@purnesh42H purnesh42H force-pushed the generic-xds-client-lrs-client-e2e branch from 78ab34a to ce9ba3d Compare April 19, 2025 18:36
@purnesh42H purnesh42H requested a review from dfawley April 21, 2025 05:39
@purnesh42H
Copy link
Contributor Author

purnesh42H commented Apr 21, 2025

@dfawley assigning this for review since #8183 is close now. loadreport_test.go is commented due to testing helpers introduced in 8183 but i have tested in my fork. Will add Easwar once he finishes 8183.

@purnesh42H purnesh42H force-pushed the generic-xds-client-lrs-client-e2e branch 2 times, most recently from e140792 to 2e2674f Compare April 21, 2025 18:22
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine overall. Just a few comments inline.

func (ls *LoadStore) ReporterForCluster(clusterName, serviceName string) PerClusterReporter {
panic("unimplemented")
func (ls *LoadStore) ReporterForCluster(clusterName, serviceName string) *PerClusterReporter {
if ls == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to panic if a nil LoadStore is used. Why not? It seems like a pretty severe programming error.

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed nil check. @easwars any reason why this check is there in existing code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably for tests to use a nil load store. If that is not required anymore and tests are happy, we should be good to remove the nil check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of the nil check? Does any test fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Removed. I had removed others but looks like this was left.

}

// CallStarted records a call started in the LoadStore.
func (p *PerClusterReporter) CallStarted(locality string) {
panic("unimplemented")
if p == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. And below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

func (rcd *rpcCountData) decrInProgress() {
atomic.AddUint64(rcd.inProgress, negativeOneUInt64) // atomic.Add(x, -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the const doesn't seem to buy us anything, since we're already needing to comment what this means. IMO delete the constant and inline it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it inline

Comment on lines 457 to 460
s = rld.sum
rld.sum = 0
c = rld.count
rld.count = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something like this, which might(?) be more quickly understood:

Suggested change
s = rld.sum
rld.sum = 0
c = rld.count
rld.count = 0
s, rld.sum = rld.sum, 0
c, rld.count = rld.count, 0

Or,

Suggested change
s = rld.sum
rld.sum = 0
c = rld.count
rld.count = 0
s, c = rld.sum, rld.count
rld.sum, rld.count = 0, 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the first one

c = rld.count
rld.count = 0
rld.mu.Unlock()
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use bare returns that return values. It can be hard to understand what's going on, mainly for longer functions.

Suggested change
return
return s, c

Or pair with the second option above:

func (rld *rpcLoadData) loadAndClear() (float64, int64) {
	rld.mu.Lock()
	defer rld.mu.Unlock()

	s, c := rld.sum, rld.count
	rld.sum, rld.count = 0, 0
	return s, c
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah added them to return

return c, err
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mentioned in the PR description. It is because the tests had compilation errors because some things were added in 8183 PR. Now its merged so i have rebased on latest and uncommented it.

@dfawley dfawley assigned purnesh42H and unassigned dfawley Apr 21, 2025
@dfawley
Copy link
Member

dfawley commented Apr 21, 2025

I mostly skimmed the changes - @easwars may also want to take a quick pass.

The commits here aren't quite as easy to review as the last change, since they go file-by-file. It would have been easier if one commit copied all the files, so that we could just skip that one commit when reviewing.

@easwars easwars self-assigned this Apr 22, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-lrs-client-e2e branch from 9b88d68 to a263158 Compare April 25, 2025 06:15
@purnesh42H purnesh42H requested a review from easwars April 25, 2025 06:17
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Apr 25, 2025
@easwars easwars assigned purnesh42H and unassigned easwars Apr 29, 2025
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Apr 30, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-lrs-client-e2e branch 6 times, most recently from edabfbc to 739c6b3 Compare May 1, 2025 19:40
@purnesh42H purnesh42H force-pushed the generic-xds-client-lrs-client-e2e branch from 739c6b3 to 882bab4 Compare May 1, 2025 19:41
@@ -150,9 +155,24 @@ func (lrs *streamImpl) sendLoads(ctx context.Context, stream clients.Stream, clu
case <-tick.C:
case <-ctx.Done():
return
case <-lrs.finalSendRequest:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we don't use these two new fields, finalSendRequest and finalSendDone and instead attempt to the send the last load report anyways when ctx is done? If we do that, we won't even have to accept a context from the user in Stop().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. We can go ahead with current approach involving finalSendRequest and finalSendDone because sending load report after the top level context ctx done is not reliable as the stream context is child of top level context and the cancelation of ctx will propagate to stream context is as well.

@@ -43,38 +68,372 @@ type LoadStore struct {
// attempt to flush any unreported load data to the LRS server. It will either
// wait for this attempt to complete, or for the provided context to be done
// before canceling the LRS stream.
func (ls *LoadStore) Stop(ctx context.Context) error {
panic("unimplemented")
func (ls *LoadStore) Stop(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a comment which could possibly simplify the implementation.

Here though, I still think we should remove the last two paragraphs in this doc string. There is no mention anywhere else about reference counting and sharing of LRS streams etc, and that is completely implementation detail that a user of this API does not need to know about.

@easwars easwars assigned purnesh42H and unassigned easwars May 1, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-lrs-client-e2e branch from dca14fc to 08149d8 Compare May 2, 2025 07:51
@@ -43,38 +68,372 @@ type LoadStore struct {
// attempt to flush any unreported load data to the LRS server. It will either
// wait for this attempt to complete, or for the provided context to be done
// before canceling the LRS stream.
func (ls *LoadStore) Stop(ctx context.Context) error {
panic("unimplemented")
func (ls *LoadStore) Stop(ctx context.Context) {
Copy link
Contributor

@easwars easwars May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove the "ctx" from the docstring. Just say "provided context"

@purnesh42H purnesh42H merged commit 75d25ee into grpc:master May 5, 2025
30 of 33 checks passed
purnesh42H added a commit to purnesh42H/grpc-go that referenced this pull request May 8, 2025
vinothkumarr227 pushed a commit to vinothkumarr227/grpc-go that referenced this pull request May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants