Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickfirst: New pick first policy for dualstack #7498

Merged
merged 64 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
a1f262c
Use one subconn per address in pickfirst
arjan-bal Jul 8, 2024
9043079
address review comments
arjan-bal Jul 11, 2024
2541d51
ensure subConnList is never nil
arjan-bal Jul 11, 2024
f961973
Defer subConnList creation till conection start
arjan-bal Jul 12, 2024
cb5acc6
Move empty subConnList handling into refreshList function
arjan-bal Jul 12, 2024
22e0bd4
Fix race conditions
arjan-bal Jul 12, 2024
258145a
Replace mutex with atomic for subConnList state
arjan-bal Jul 12, 2024
eeacec3
Use a go routine to manage phase 1 connection attempts
arjan-bal Jul 15, 2024
632f4df
support both pickfirst together
arjan-bal Aug 7, 2024
69c6c7b
Use Java style implementation
arjan-bal Aug 8, 2024
67b3e93
Revert comment change
arjan-bal Aug 8, 2024
cceb75d
Fix revive warnings
arjan-bal Aug 8, 2024
0d43e4b
Synchronize using callback serializer
arjan-bal Aug 8, 2024
b2a092a
Prepare for running all tests with new pf
arjan-bal Aug 8, 2024
56bfb59
Fix test failures
arjan-bal Aug 9, 2024
17c63d2
Enable test using new pf
arjan-bal Aug 9, 2024
f1daad1
Fix tests
arjan-bal Aug 9, 2024
747bd8a
Add unit tests
arjan-bal Aug 9, 2024
84194db
Fix vet
arjan-bal Aug 9, 2024
586b091
Calculate coverage using new pickfirst also
arjan-bal Aug 9, 2024
31e8a10
Fix lint error
arjan-bal Aug 9, 2024
fcb0120
simplify serializer usage
arjan-bal Aug 9, 2024
91b63ce
Remove re-resolution requsts since they are made by addrConn
arjan-bal Aug 9, 2024
036fea3
Verify server addr in subconn in tests
arjan-bal Aug 10, 2024
9a2d6a9
test balancer state transitions
arjan-bal Aug 10, 2024
2a36f7f
Fix address to endpoint conversion logic
arjan-bal Aug 12, 2024
a8a243a
Address review comments
arjan-bal Aug 14, 2024
abb263b
Remove stale comment
arjan-bal Aug 14, 2024
67b9b6b
Fix tests
arjan-bal Aug 14, 2024
0792002
Fix vet
arjan-bal Aug 14, 2024
3561462
start connecting in exitIdle only when balancer is in idle
arjan-bal Aug 14, 2024
d82da59
Address review comments
arjan-bal Aug 16, 2024
2ed4aff
Add comments to address review
arjan-bal Aug 19, 2024
fe8637f
Handle subconn transitions from CONNECTING to IDLE
arjan-bal Aug 20, 2024
02267d3
address review comments
arjan-bal Aug 22, 2024
a3049fe
Address review comments
arjan-bal Aug 23, 2024
11924a9
fix vet
arjan-bal Aug 23, 2024
fd4817a
User existing test utils
arjan-bal Aug 23, 2024
e266639
make tests more readable
arjan-bal Aug 24, 2024
c44c80a
address review comments
arjan-bal Aug 30, 2024
4a221d1
no more table driven tests
arjan-bal Aug 30, 2024
08e8cb9
Keep b.state and state rerported to clientconn in sync
arjan-bal Sep 2, 2024
2b2c2a3
Address review comments
arjan-bal Sep 5, 2024
34da793
Address review comments
arjan-bal Sep 6, 2024
6a7720e
Update picker less frequently
arjan-bal Sep 9, 2024
1914445
Fix typos
arjan-bal Sep 10, 2024
ef1af59
address review comments
arjan-bal Sep 17, 2024
45f3368
Merge remote-tracking branch 'source/master' into pickfirst_one_addre…
arjan-bal Sep 17, 2024
efa343b
Fix test breakage due to #7613
arjan-bal Sep 17, 2024
60e14d0
Replace the callback serializer with a mutex
arjan-bal Sep 18, 2024
88fbabe
Convert requestConnectionLocked to iteration from recursion
arjan-bal Sep 18, 2024
30256a5
Fix lint
arjan-bal Sep 18, 2024
6daf9cc
Report TF on resolver error
arjan-bal Sep 19, 2024
1f03ef9
Comment wrapping, s/subcon/SubCon/g and flatten the endpoint list ear…
arjan-bal Sep 25, 2024
61c4816
Don't close SubConns in resolverErrorLocked and fix comments
arjan-bal Sep 25, 2024
2e9f873
Move common tests to pickfirst_test.go
arjan-bal Sep 25, 2024
c4b4aa4
Set first pass when exiting idle
arjan-bal Oct 7, 2024
f0e479e
Ensure exitIdle doesn't increment the address list multiple times
arjan-bal Oct 7, 2024
58e2ff7
Move handling of CONNECITN to IDLE subchannels to outer block
arjan-bal Oct 7, 2024
3103efe
Fix test log statement
arjan-bal Oct 8, 2024
a40d619
Merge branch 'master' of github.com:grpc/grpc-go into pickfirst_one_a…
arjan-bal Oct 9, 2024
e98bb80
Organize packages based on review comments
arjan-bal Oct 9, 2024
a5b55e4
use onceFunc in idlePicker and reduce duplication when entering IDLE
arjan-bal Oct 9, 2024
eb5a09a
Fix comments
arjan-bal Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ jobs:
- name: Run coverage
run: go test -coverprofile=coverage.out -coverpkg=./... ./...

- name: Run coverage with new pickfirst
run: GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=true go test -coverprofile=coverage_new_pickfirst.out -coverpkg=./... ./...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything use this coverage_new_pickfirst.out file? How? By wildcard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that it gets used by checking that the coverage went up after adding this step.

Codecov docs just say that multiple reports are merged "automatically", but don't specify the logic:

It is possible to explicitly specify the coverage files, we don't do that for the existing coverage.out filename.


- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,11 @@ jobs:
- type: tests
goversion: '1.21'

- type: tests
goversion: '1.22'
testflags: -race
grpcenv: 'GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=true'

steps:
# Setup the environment.
- name: Setup GOARCH
Expand Down
6 changes: 6 additions & 0 deletions balancer/pickfirst/pickfirst.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,19 @@ import (
"google.golang.org/grpc/balancer/pickfirst/internal"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/grpclog"
"google.golang.org/grpc/internal/envconfig"
internalgrpclog "google.golang.org/grpc/internal/grpclog"
"google.golang.org/grpc/internal/pretty"
"google.golang.org/grpc/resolver"
"google.golang.org/grpc/serviceconfig"

_ "google.golang.org/grpc/balancer/pickfirst/pickfirstleaf" // For automatically registering the new pickfirst if required.
)

func init() {
if envconfig.NewPickFirstEnabled {
return
}
balancer.Register(pickfirstBuilder{})
}

Expand Down
132 changes: 132 additions & 0 deletions balancer/pickfirst/pickfirst_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
/*
*
* Copyright 2024 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package pickfirst

import (
"context"
"errors"
"fmt"
"testing"
"time"

"google.golang.org/grpc/balancer"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/internal/grpctest"
"google.golang.org/grpc/internal/testutils"
"google.golang.org/grpc/resolver"
)

const (
// Default timeout for tests in this package.
defaultTestTimeout = 10 * time.Second
// Default short timeout, to be used when waiting for events which are not
// expected to happen.
defaultTestShortTimeout = 100 * time.Millisecond
)

type s struct {
grpctest.Tester
}

func Test(t *testing.T) {
grpctest.RunSubTests(t, s{})
}

// TestPickFirstLeaf_InitialResolverError sends a resolver error to the balancer
// before a valid resolver update. It verifies that the clientconn state is
// updated to TRANSIENT_FAILURE.
func (s) TestPickFirstLeaf_InitialResolverError(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
cc := testutils.NewBalancerClientConn(t)
bal := pickfirstBuilder{}.Build(cc, balancer.BuildOptions{})
defer bal.Close()
bal.ResolverError(errors.New("resolution failed: test error"))

if err := cc.WaitForConnectivityState(ctx, connectivity.TransientFailure); err != nil {
t.Fatalf("cc.WaitForConnectivityState(%v) returned error: %v", connectivity.TransientFailure, err)
}

// After sending a valid update, the LB policy should report CONNECTING.
ccState := balancer.ClientConnState{
ResolverState: resolver.State{
Endpoints: []resolver.Endpoint{
{Addresses: []resolver.Address{{Addr: "1.1.1.1:1"}}},
{Addresses: []resolver.Address{{Addr: "2.2.2.2:2"}}},
},
},
}
if err := bal.UpdateClientConnState(ccState); err != nil {
t.Fatalf("UpdateClientConnState(%v) returned error: %v", ccState, err)
}

if err := cc.WaitForConnectivityState(ctx, connectivity.Connecting); err != nil {
t.Fatalf("cc.WaitForConnectivityState(%v) returned error: %v", connectivity.Connecting, err)
}
}

// TestPickFirstLeaf_ResolverErrorinTF sends a resolver error to the balancer
// before when it's attempting to connect to a SubConn TRANSIENT_FAILURE. It
// verifies that the picker is updated and the SubConn is not closed.
func (s) TestPickFirstLeaf_ResolverErrorinTF(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
cc := testutils.NewBalancerClientConn(t)
bal := pickfirstBuilder{}.Build(cc, balancer.BuildOptions{})
defer bal.Close()

// After sending a valid update, the LB policy should report CONNECTING.
ccState := balancer.ClientConnState{
ResolverState: resolver.State{
Endpoints: []resolver.Endpoint{
{Addresses: []resolver.Address{{Addr: "1.1.1.1:1"}}},
},
},
}

if err := bal.UpdateClientConnState(ccState); err != nil {
t.Fatalf("UpdateClientConnState(%v) returned error: %v", ccState, err)
}

sc1 := <-cc.NewSubConnCh
if err := cc.WaitForConnectivityState(ctx, connectivity.Connecting); err != nil {
t.Fatalf("cc.WaitForConnectivityState(%v) returned error: %v", connectivity.Connecting, err)
}

scErr := fmt.Errorf("test error: connection refused")
sc1.UpdateState(balancer.SubConnState{
ConnectivityState: connectivity.TransientFailure,
ConnectionError: scErr,
})

if err := cc.WaitForPickerWithErr(ctx, scErr); err != nil {
t.Fatalf("cc.WaitForPickerWithErr(%v) returned error: %v", scErr, err)
}

bal.ResolverError(errors.New("resolution failed: test error"))
if err := cc.WaitForErrPicker(ctx); err != nil {
t.Fatalf("cc.WaitForPickerWithErr() returned error: %v", err)
}

select {
case <-time.After(defaultTestShortTimeout):
case sc := <-cc.ShutdownSubConnCh:
t.Fatalf("Unexpected SubConn shutdown: %v", sc)
}
}
Loading