Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: Introduce ServiceClient #7489

Merged
merged 18 commits into from
Dec 21, 2023

Conversation

CabinfeverB
Copy link
Member

@CabinfeverB CabinfeverB commented Dec 4, 2023

What problem does this PR solve?

Issue Number: ref #7431 ref #7576

What is changed and how does it work?

This PR introduce the interface ServiceClient which can get the gRPC client of a specific gRPC server in a PD cluster. It can send request to the follower including follower-handle or forward.
And also a balancer is introduced.
Some tools have been added to client to make it possible to run integration tests in the client package. The reason for adding it to the client package is that ServiceClient should be invisible to the caller, so it cannot be tested well in tests/integrations.

Check List

Tests

  • Unit test
  • Integration test

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Copy link
Contributor

ti-chi-bot bot commented Dec 4, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JmPotato
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 4, 2023
@CabinfeverB
Copy link
Member Author

/cc @niubell

Copy link
Contributor

ti-chi-bot bot commented Dec 4, 2023

@CabinfeverB: GitHub didn't allow me to request PR reviews from the following users: niubell.

Note that only tikv members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @niubell

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Copy link

codecov bot commented Dec 4, 2023

Codecov Report

Merging #7489 (970cc22) into master (f1870c1) will decrease coverage by 0.01%.
The diff coverage is 73.22%.

❗ Current head 970cc22 differs from pull request most recent head cd364e2. Consider uploading reports for the commit cd364e2 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7489      +/-   ##
==========================================
- Coverage   74.75%   74.75%   -0.01%     
==========================================
  Files         457      459       +2     
  Lines       50415    50541     +126     
==========================================
+ Hits        37686    37780      +94     
- Misses       9394     9412      +18     
- Partials     3335     3349      +14     
Flag Coverage Δ
unittests 74.75% <73.22%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

client/grpcutil/grpcutil.go Outdated Show resolved Hide resolved
Comment on lines 116 to 118
Available() bool
// CheckAvailable checks the status about network connection or other availability of the current service client
CheckAvailable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckAvailable is used to try to mark the client as available if it is in unavailable state. It's not quite the right name. Any suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, why not merge these two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me have a try

// CheckNetworkAvailable checks if the network connection for the current service client is available
CheckNetworkAvailable(context.Context)
// RespToErr checks if client need to retry based on the PD server error response.
RespToErr(*pdpb.Error, error) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a method to mark the follower client as unavailable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just NeedRetry/IsRetryable or something similar?

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
client/grpcutil/grpcutil.go Outdated Show resolved Hide resolved
client/pd_service_discovery.go Outdated Show resolved Hide resolved
client/testutil/check_env_linux.go Outdated Show resolved Hide resolved
Comment on lines 318 to 319
i := 0
defer c.mu.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exchange these two?

return nil
}
for ; i < c.totalNode; i++ {
if c.now.Available() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not available, will it be excluded in the next round?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I don't remove unavailable nodes from the queue. The simple approach is taken now and can be optimized in the future if needed

// mustLeader: whether must send to leader.
BuildGRPCContext(ctx context.Context, mustLeader bool) context.Context
// IsLeader returns whether the target PD server is leader.
IsLeader() bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the leader changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If leader changes, it will create new ServiceClient and store it in leader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about targetIsLeader or something else?

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
}

// BuildGRPCContext implements ServiceClient.
func (c *pdServiceClient) BuildGRPCContext(ctx context.Context, toLeader bool) context.Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between toLeader and isLeader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function of the ServiceClient is to interact with a PD server, which may be a follower or leader. When a client sends a request to a server, mutLeader is true if the request is forwarded to the leader. Otherwise, it is sent to the server to process the request, and mutLeader is false.


// pdServiceBalancer is a load balancer for PD service clients.
// It is used to balance the request to all servers and manage the connections to multiple PD service nodes.
type pdServiceBalancer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about supporting multiple policies? And make the round-robin a common library?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here because it seems that there is forwarding logic in the internal ServiceClient, and now we have introduced an additional layer of the load balancer. Will these two things conflict with each other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here because it seems that there is forwarding logic in the internal ServiceClient, and now we have introduced an additional layer of the load balancer. Will these two things conflict with each other?

There will be no conflict. In the past, followers were randomly selected for forwarding, but now it is in turn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about supporting multiple policies? And make the round-robin a common library?

I've thought about it. Since it is observed that there is now only one Round-Robin balancer in /pkg/balancer, I haven't implemented multiple policies yet

@CabinfeverB CabinfeverB requested a review from JmPotato December 18, 2023 08:17
@CabinfeverB
Copy link
Member Author

@JmPotato PTAL~

// BuildGRPCContext builds a context object with a gRPC context.
// ctx: the original context object.
// mustLeader: whether must send to leader.
BuildGRPCContext(ctx context.Context, mustLeader bool) context.Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the name, it's hard to tell what this build is for. Suggest using a more specified name like BuildGRPCTargetContext.

Comment on lines 122 to 123
var _ ServiceClient = (*pdServiceClient)(nil)
var _ ServiceClient = (*pdServiceAPIClient)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var _ ServiceClient = (*pdServiceClient)(nil)
var _ ServiceClient = (*pdServiceAPIClient)(nil)
var (
_ ServiceClient = (*pdServiceClient)(nil)
_ ServiceClient = (*pdServiceAPIClient)(nil)
)

networkFailure atomic.Bool
}

func newPDServiceClient(addr, leaderAddr string, conn *grpc.ClientConn, isLeader bool) *pdServiceClient {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using the interface as the returned value is better.

Suggested change
func newPDServiceClient(addr, leaderAddr string, conn *grpc.ClientConn, isLeader bool) *pdServiceClient {
func newPDServiceClient(addr, leaderAddr string, conn *grpc.ClientConn, isLeader bool) ServiceClient {

}

// NetworkAvailable implements ServiceClient.
func (c *pdServiceClient) Available() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the judgment condition is whether there is a network error, would Reachable be a better name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pdServiceAPIClient checks the network and availability


// pdServiceBalancer is a load balancer for PD service clients.
// It is used to balance the request to all servers and manage the connections to multiple PD service nodes.
type pdServiceBalancer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here because it seems that there is forwarding logic in the internal ServiceClient, and now we have introduced an additional layer of the load balancer. Will these two things conflict with each other?

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM.

client/go.mod Outdated
@@ -19,6 +20,8 @@ require (
google.golang.org/grpc v1.54.0
)

require google.golang.org/genproto/googleapis/rpc v0.0.0-20231106174013-bbf56f31fb17 // indirect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need format

// mustLeader: whether must send to leader.
BuildGRPCContext(ctx context.Context, mustLeader bool) context.Context
// IsLeader returns whether the target PD server is leader.
IsLeader() bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about targetIsLeader or something else?

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 20, 2023
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the name check_env_dummy.go is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also renamed the file in /pkg/utils/tempurl. PTAL

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 21, 2023
@CabinfeverB
Copy link
Member Author

/merge

Copy link
Contributor

ti-chi-bot bot commented Dec 21, 2023

@CabinfeverB: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Dec 21, 2023

This pull request has been accepted and is ready to merge.

Commit hash: 970cc22

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 21, 2023
Copy link
Contributor

ti-chi-bot bot commented Dec 21, 2023

@CabinfeverB: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 74ef91d into tikv:master Dec 21, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants