idle: move idleness manager to separate package and ~13s of tests into it #6566

dfawley · 2023-08-21T19:53:56Z

Also: pull out some shared testing code to make it easier to use in other packages than test/.

Before:

$ time go test -count=1 ./...
<snip>
real	0m47.221s
user	1m31.381s
sys	0m30.061s

After:

$ time go test -count=1 ./...
<snip>
real	0m38.994s
user	1m30.279s
sys	0m29.964s

For reference:

$ go test -count=1 ./internal/idle
ok  	google.golang.org/grpc/internal/idle	12.739s

RELEASE NOTES: none

easwars · 2023-08-21T23:22:05Z

internal/idle/idle.go

 )

 // For overriding in unit tests.
 var timeAfterFunc = func(d time.Duration, f func()) *time.Timer {
 	return time.AfterFunc(d, f)
 }

-// idlenessEnforcer is the functionality provided by grpc.ClientConn to enter
+// IdlenessEnforcer is the functionality provided by grpc.ClientConn to enter


Should all of these names not include the Idleness prefix?

I'm ok with names being idle.Enforcer, idle.Manager, and noopManager, managerImpl, idle.ManagerOptions and idle.NewManager().

If you feel that is not descriptive enough, we can rename the package idleness and then these names would be idleness.Enforcer and so on.

Wdyt?

Sure, I'm fine with all these naming changes. I'll stick with idle as the package name.

easwars · 2023-08-21T23:23:55Z

internal/idle/idle_test.go

@@ -245,29 +257,29 @@ func (s) TestIdlenessManager_Enabled_ExitIdleOnRPC(t *testing.T) {
 	overrideNewTimer(t)

 	enforcer := newTestIdlenessEnforcer()
-	mgr := newIdlenessManager(enforcer, time.Duration(defaultTestIdleTimeout))
-	defer mgr.close()
+	mgr := NewIdlenessManager(IdlenessManagerOptions{Enforcer: enforcer, Timeout: time.Duration(defaultTestIdleTimeout), Logger: grpclog.Component("test")})


That is interesting. I have usually passed a nil logger in tests. Does this result in logs being any different in the tests?

But wouldn't a nil logger panic if used?

easwars · 2023-08-21T23:27:48Z

internal/testutils/state.go

+	}
+}
+
+// AwaitNotState waits for sc to leave stateWant or fatal errors if it doesn't


s/stateWant/stateDoNotWant

easwars · 2023-08-21T23:28:39Z

internal/testutils/state.go

+	t.Helper()
+	for state := sc.GetState(); state != stateWant; state = sc.GetState() {
+		if !sc.WaitForStateChange(ctx, state) {
+			t.Fatalf("timed out waiting for state change.  got %v; want %v", state, stateWant)


Here and in the next t.Fatalf(), s/timed/Timed/

easwars · 2023-08-21T23:38:46Z

internal/testutils/state.go

+	}
+}
+
+// AwaitState waits for sc to enter stateWant or fatal errors if it doesn't


We seem to be in direct violation of this guideline: go/go-style/decisions#assert.

AFAICT this is not a generic "assertion library" like assert.IsNotNil()/etc. This is something that aids in writing tests specific to gRPC by handling some of the fiddly-ness of our library.

easwars · 2023-08-21T23:40:11Z

internal/testutils/wrappers.go

+
+// ErrCloseWrapper wraps closer with a function that does not return an error,
+// but calls t.Error if it does, instead.
+func ErrCloseWrapper(t *testing.T, closer func() error) func() {


Same here. If the closer doesn't return an informative error, we don't have enough contextual information here to push an informative error message to the test output.

The reason we need this is the real problem.

We have a "ForTesting" that returns a cleanup function that can error. AND the error it returns is a timeout. But the function doesn't accept a context.

Also, the cleanup doesn't seem important anyway. It is more like an assertion that all the channelz state was fully cleaned up, which: 1. if important, there should be a dedicated test/tests for, and 2. is basically made moot by the leak checker.

Actually....this whole function is unnecessary now AFAICT. Deleted everywhere.

But there are a few tests that relied upon the ID resetting. So I exported the ID generator (it's in internal) in order to reset it for those few tests. Ideally the tests wouldn't be sensitive to this fact, but I think we're already too far outside the scope of this PR now.

dfawley · 2023-08-22T20:44:55Z

Sorry, my push failed because I made a commit on my laptop that I hadn't pulled onto my workstation. This is ready to review now.

dfawley · 2023-08-22T20:56:47Z

Wait now that I did that, my tests are failing, sorry!

easwars · 2023-08-23T01:12:12Z

clientconn.go

@@ -266,7 +267,7 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *
 	// Configure idleness support with configured idle timeout or default idle
 	// timeout duration. Idleness can be explicitly disabled by the user, by
 	// setting the dial option to 0.
-	cc.idlenessMgr = newIdlenessManager(cc, cc.dopts.idleTimeout)
+	cc.idlenessMgr = idle.NewManager(idle.ManagerOptions{Enforcer: (*idler)(cc), Timeout: cc.dopts.idleTimeout, Logger: logger})


Just curious why you find this single line literal struct initialization more readable than the more common multiline one.

IDK personally I find the parameters passed in here are really obvious and uninteresting, and I'd rather have one operation per line more (for denser code, which is easier to scroll through) than I'd like to be able to more spaciously see all the details of the values. If I needed to do tricky math or call a function to set them, I'd probably make it multi-line.

Same reason I dislike most var blocks, particularly in functions, e.g.

var ( i int p peer.Peer )

Those variables aren't even related; why group them when it takes even more lines than declaring them separately?

easwars · 2023-08-23T01:18:11Z

internal/idle/idle.go

-func (i *idlenessManagerImpl) exitIdleMode() error {
-	i.idleMu.Lock()
-	defer i.idleMu.Unlock()
+func (m *manager) ExitIdleMode() error {


We don't need to export this method, right?

Good call; fixed.

easwars · 2023-08-23T01:20:02Z

internal/idle/idle_test.go

+	ExitIdleCh  chan struct{}
+	EnterIdleCh chan struct{}


Do these channels have to be exported?

Whoops. Changed back.

dfawley · 2023-08-23T18:25:20Z

The changes to testing methodology of channelz turned up a latent race in the deletion code:

sync.runtime_SemacquireMutex(0x7f85605b88e0?, 0xc0?, 0x4167b4?)
	/opt/hostedtoolcache/go/1.20.7/x64/src/runtime/sema.go:77 +0x26
sync.(*Mutex).lockSlow(0xc0002ffe28)
	/opt/hostedtoolcache/go/1.20.7/x64/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/opt/hostedtoolcache/go/1.20.7/x64/src/sync/mutex.go:90
google.golang.org/grpc/internal/channelz.(*channelTrace).clear(0xc0002ffe00)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:659 +0x45
google.golang.org/grpc/internal/channelz.(*channel).deleteSelfIfReady(0xc0009f1d40)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:288 +0x1d3
google.golang.org/grpc/internal/channelz.(*channel).deleteChild(0xc0009f1d40, 0xc0001d3470?)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:224 +0x58
google.golang.org/grpc/internal/channelz.(*subChannel).deleteSelfFromTree(0xc000df6480)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:357 +0x15d
google.golang.org/grpc/internal/channelz.(*subChannel).deleteSelfIfReady(0xc000df6480)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:389 +0x25
google.golang.org/grpc/internal/channelz.(*channelMap).decrTraceRefCount(0xc00010a960, 0x40)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/funcs.go:405 +0x152
google.golang.org/grpc/internal/channelz.(*channelTrace).clear(0xc0002ffe00)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:663 +0x8a
google.golang.org/grpc/internal/channelz.(*channel).deleteSelfIfReady(0xc0009f1d40)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:288 +0x1d3
google.golang.org/grpc/internal/channelz.(*channel).triggerDelete(0xae8e00?)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:229 +0x1d
google.golang.org/grpc/internal/channelz.(*channelMap).removeEntry(0xc00010a960, 0x3f)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/funcs.go:396 +0x139
google.golang.org/grpc/internal/channelz.RemoveEntry(0xc000eef770)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/funcs.go:282 +0x3e
google.golang.org/grpc.(*ClientConn).Close(0xc00047f400)
	/home/runner/work/grpc-go/grpc-go/clientconn.go:1292 +0x272

Essentially, what's happening here is:

Close the channel and delete the channelz ID for it
Remove the channelz entity from the global map
Tell the channel to delete itself and its child references
The channel clears its trace log
The trace log has references to the subchannel under the channel
Decrementing the final reference to the subchannel instructs the channel to remove the subchannel from itself
That implies that the channel should attempt to delete itself
That means that it needs to clear out the trace log, which leads to taking the channelTrace mutex a second time

To fix this deadlock, I did what was done elsewhere in this same code: set a boolean to indicate the trace is already being cleared (closeCalled elsewhere). That should fix things for now.

Medium-term, I'm pretty worried about this code. It's not very understandable and all the back-references are problematic, because they could have similar types of races, especially if multiple things are being deleted at the same time. I don't believe that removing a child from a parent should lead to the parent attempting to delete itself, and I also don't think removing a reference to an entity should ever make that entity delete itself from its parent, but changing these things leads to failures. A bigger rewrite is probably called for here.

easwars · 2023-08-23T19:44:35Z

LGTM for the last commit.

idle: move idleness manager to separate package and 12s of tests into it

7aac96b

dfawley added the Type: Testing label Aug 21, 2023

dfawley added this to the 1.58 Release milestone Aug 21, 2023

dfawley requested a review from easwars August 21, 2023 19:53

dfawley assigned easwars Aug 21, 2023

Package comment

d103b4e

dfawley mentioned this pull request Aug 21, 2023

Add a ClientConn level test for idleness race #6560

Closed

easwars reviewed Aug 21, 2023

View reviewed changes

easwars assigned dfawley and unassigned easwars Aug 21, 2023

dfawley added 2 commits August 22, 2023 13:44

remove callback from NewChannelzStorageForTesting

f4c72d9

remove NewChannelzStorageForTesting entirely

efdc363

dfawley assigned easwars and unassigned dfawley Aug 22, 2023

dfawley assigned dfawley and unassigned easwars Aug 22, 2023

dfawley added 2 commits August 22, 2023 14:00

remove debug code

50204df

call Stop on the grpc.Server behind the handler server on shutdown

d8306c2

dfawley assigned easwars and unassigned dfawley Aug 22, 2023

easwars reviewed Aug 23, 2023

View reviewed changes

easwars assigned dfawley and unassigned easwars Aug 23, 2023

unexport things again

2361114

dfawley assigned easwars and unassigned dfawley Aug 23, 2023

easwars approved these changes Aug 23, 2023

View reviewed changes

easwars assigned dfawley Aug 23, 2023

easwars removed their assignment Aug 23, 2023

dfawley closed this Aug 23, 2023

dfawley reopened this Aug 23, 2023

prevent re-entrant calls to channelTrace.clear()

e25e730

dfawley merged commit 81b9df2 into grpc:master Aug 23, 2023
1 check passed

dfawley deleted the idle branch August 23, 2023 19:50

github-actions bot locked as resolved and limited conversation to collaborators Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idle: move idleness manager to separate package and ~13s of tests into it #6566

idle: move idleness manager to separate package and ~13s of tests into it #6566

dfawley commented Aug 21, 2023 •

edited

Loading

easwars Aug 21, 2023

dfawley Aug 22, 2023

easwars Aug 21, 2023

dfawley Aug 22, 2023

easwars Aug 21, 2023

dfawley Aug 22, 2023

easwars Aug 21, 2023

dfawley Aug 22, 2023

easwars Aug 21, 2023

dfawley Aug 22, 2023

easwars Aug 21, 2023

dfawley Aug 22, 2023

dfawley commented Aug 22, 2023

dfawley commented Aug 22, 2023

easwars Aug 23, 2023

dfawley Aug 23, 2023

easwars Aug 23, 2023

dfawley Aug 23, 2023

easwars Aug 23, 2023

dfawley Aug 23, 2023

dfawley commented Aug 23, 2023

easwars commented Aug 23, 2023

idle: move idleness manager to separate package and ~13s of tests into it #6566

idle: move idleness manager to separate package and ~13s of tests into it #6566

Conversation

dfawley commented Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfawley commented Aug 22, 2023

dfawley commented Aug 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfawley commented Aug 23, 2023

easwars commented Aug 23, 2023

dfawley commented Aug 21, 2023 •

edited

Loading