Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idle: move idleness manager to separate package and ~13s of tests into it #6566

Merged
merged 8 commits into from
Aug 23, 2023

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Aug 21, 2023

Also: pull out some shared testing code to make it easier to use in other packages than test/.

Before:

$ time go test -count=1 ./...
<snip>
real	0m47.221s
user	1m31.381s
sys	0m30.061s

After:

$ time go test -count=1 ./...
<snip>
real	0m38.994s
user	1m30.279s
sys	0m29.964s

For reference:

$ go test -count=1 ./internal/idle
ok  	google.golang.org/grpc/internal/idle	12.739s

RELEASE NOTES: none

@dfawley dfawley added this to the 1.58 Release milestone Aug 21, 2023
@dfawley dfawley requested a review from easwars August 21, 2023 19:53
)

// For overriding in unit tests.
var timeAfterFunc = func(d time.Duration, f func()) *time.Timer {
return time.AfterFunc(d, f)
}

// idlenessEnforcer is the functionality provided by grpc.ClientConn to enter
// IdlenessEnforcer is the functionality provided by grpc.ClientConn to enter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all of these names not include the Idleness prefix?

I'm ok with names being idle.Enforcer, idle.Manager, and noopManager, managerImpl, idle.ManagerOptions and idle.NewManager().

If you feel that is not descriptive enough, we can rename the package idleness and then these names would be idleness.Enforcer and so on.

Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm fine with all these naming changes. I'll stick with idle as the package name.

@@ -245,29 +257,29 @@ func (s) TestIdlenessManager_Enabled_ExitIdleOnRPC(t *testing.T) {
overrideNewTimer(t)

enforcer := newTestIdlenessEnforcer()
mgr := newIdlenessManager(enforcer, time.Duration(defaultTestIdleTimeout))
defer mgr.close()
mgr := NewIdlenessManager(IdlenessManagerOptions{Enforcer: enforcer, Timeout: time.Duration(defaultTestIdleTimeout), Logger: grpclog.Component("test")})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. I have usually passed a nil logger in tests. Does this result in logs being any different in the tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't a nil logger panic if used?

}
}

// AwaitNotState waits for sc to leave stateWant or fatal errors if it doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/stateWant/stateDoNotWant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

t.Helper()
for state := sc.GetState(); state != stateWant; state = sc.GetState() {
if !sc.WaitForStateChange(ctx, state) {
t.Fatalf("timed out waiting for state change. got %v; want %v", state, stateWant)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in the next t.Fatalf(), s/timed/Timed/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

}
}

// AwaitState waits for sc to enter stateWant or fatal errors if it doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be in direct violation of this guideline: go/go-style/decisions#assert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this is not a generic "assertion library" like assert.IsNotNil()/etc. This is something that aids in writing tests specific to gRPC by handling some of the fiddly-ness of our library.


// ErrCloseWrapper wraps closer with a function that does not return an error,
// but calls t.Error if it does, instead.
func ErrCloseWrapper(t *testing.T, closer func() error) func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. If the closer doesn't return an informative error, we don't have enough contextual information here to push an informative error message to the test output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we need this is the real problem.

We have a "ForTesting" that returns a cleanup function that can error. AND the error it returns is a timeout. But the function doesn't accept a context.

Also, the cleanup doesn't seem important anyway. It is more like an assertion that all the channelz state was fully cleaned up, which: 1. if important, there should be a dedicated test/tests for, and 2. is basically made moot by the leak checker.

Actually....this whole function is unnecessary now AFAICT. Deleted everywhere.

But there are a few tests that relied upon the ID resetting. So I exported the ID generator (it's in internal) in order to reset it for those few tests. Ideally the tests wouldn't be sensitive to this fact, but I think we're already too far outside the scope of this PR now.

@easwars easwars assigned dfawley and unassigned easwars Aug 21, 2023
@dfawley dfawley assigned easwars and unassigned dfawley Aug 22, 2023
@dfawley
Copy link
Member Author

dfawley commented Aug 22, 2023

Sorry, my push failed because I made a commit on my laptop that I hadn't pulled onto my workstation. This is ready to review now.

@dfawley dfawley assigned dfawley and unassigned easwars Aug 22, 2023
@dfawley
Copy link
Member Author

dfawley commented Aug 22, 2023

Wait now that I did that, my tests are failing, sorry!

@dfawley dfawley assigned easwars and unassigned dfawley Aug 22, 2023
@@ -266,7 +267,7 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *
// Configure idleness support with configured idle timeout or default idle
// timeout duration. Idleness can be explicitly disabled by the user, by
// setting the dial option to 0.
cc.idlenessMgr = newIdlenessManager(cc, cc.dopts.idleTimeout)
cc.idlenessMgr = idle.NewManager(idle.ManagerOptions{Enforcer: (*idler)(cc), Timeout: cc.dopts.idleTimeout, Logger: logger})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why you find this single line literal struct initialization more readable than the more common multiline one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK personally I find the parameters passed in here are really obvious and uninteresting, and I'd rather have one operation per line more (for denser code, which is easier to scroll through) than I'd like to be able to more spaciously see all the details of the values. If I needed to do tricky math or call a function to set them, I'd probably make it multi-line.

Same reason I dislike most var blocks, particularly in functions, e.g.

var (
	i int
	p peer.Peer
)

Those variables aren't even related; why group them when it takes even more lines than declaring them separately?

func (i *idlenessManagerImpl) exitIdleMode() error {
i.idleMu.Lock()
defer i.idleMu.Unlock()
func (m *manager) ExitIdleMode() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to export this method, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call; fixed.

Comment on lines 48 to 49
ExitIdleCh chan struct{}
EnterIdleCh chan struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these channels have to be exported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Changed back.

@easwars easwars assigned dfawley and unassigned easwars Aug 23, 2023
@dfawley dfawley assigned easwars and unassigned dfawley Aug 23, 2023
@easwars easwars removed their assignment Aug 23, 2023
@dfawley dfawley closed this Aug 23, 2023
@dfawley dfawley reopened this Aug 23, 2023
@dfawley
Copy link
Member Author

dfawley commented Aug 23, 2023

The changes to testing methodology of channelz turned up a latent race in the deletion code:

sync.runtime_SemacquireMutex(0x7f85605b88e0?, 0xc0?, 0x4167b4?)
	/opt/hostedtoolcache/go/1.20.7/x64/src/runtime/sema.go:77 +0x26
sync.(*Mutex).lockSlow(0xc0002ffe28)
	/opt/hostedtoolcache/go/1.20.7/x64/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/opt/hostedtoolcache/go/1.20.7/x64/src/sync/mutex.go:90
google.golang.org/grpc/internal/channelz.(*channelTrace).clear(0xc0002ffe00)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:659 +0x45
google.golang.org/grpc/internal/channelz.(*channel).deleteSelfIfReady(0xc0009f1d40)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:288 +0x1d3
google.golang.org/grpc/internal/channelz.(*channel).deleteChild(0xc0009f1d40, 0xc0001d3470?)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:224 +0x58
google.golang.org/grpc/internal/channelz.(*subChannel).deleteSelfFromTree(0xc000df6480)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:357 +0x15d
google.golang.org/grpc/internal/channelz.(*subChannel).deleteSelfIfReady(0xc000df6480)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:389 +0x25
google.golang.org/grpc/internal/channelz.(*channelMap).decrTraceRefCount(0xc00010a960, 0x40)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/funcs.go:405 +0x152
google.golang.org/grpc/internal/channelz.(*channelTrace).clear(0xc0002ffe00)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:663 +0x8a
google.golang.org/grpc/internal/channelz.(*channel).deleteSelfIfReady(0xc0009f1d40)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:288 +0x1d3
google.golang.org/grpc/internal/channelz.(*channel).triggerDelete(0xae8e00?)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/types.go:229 +0x1d
google.golang.org/grpc/internal/channelz.(*channelMap).removeEntry(0xc00010a960, 0x3f)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/funcs.go:396 +0x139
google.golang.org/grpc/internal/channelz.RemoveEntry(0xc000eef770)
	/home/runner/work/grpc-go/grpc-go/internal/channelz/funcs.go:282 +0x3e
google.golang.org/grpc.(*ClientConn).Close(0xc00047f400)
	/home/runner/work/grpc-go/grpc-go/clientconn.go:1292 +0x272

Essentially, what's happening here is:

  • Close the channel and delete the channelz ID for it
  • Remove the channelz entity from the global map
  • Tell the channel to delete itself and its child references
  • The channel clears its trace log
  • The trace log has references to the subchannel under the channel
  • Decrementing the final reference to the subchannel instructs the channel to remove the subchannel from itself
  • That implies that the channel should attempt to delete itself
  • That means that it needs to clear out the trace log, which leads to taking the channelTrace mutex a second time

To fix this deadlock, I did what was done elsewhere in this same code: set a boolean to indicate the trace is already being cleared (closeCalled elsewhere). That should fix things for now.

Medium-term, I'm pretty worried about this code. It's not very understandable and all the back-references are problematic, because they could have similar types of races, especially if multiple things are being deleted at the same time. I don't believe that removing a child from a parent should lead to the parent attempting to delete itself, and I also don't think removing a reference to an entity should ever make that entity delete itself from its parent, but changing these things leads to failures. A bigger rewrite is probably called for here.

@easwars
Copy link
Contributor

easwars commented Aug 23, 2023

LGTM for the last commit.

@dfawley dfawley merged commit 81b9df2 into grpc:master Aug 23, 2023
10 checks passed
@dfawley dfawley deleted the idle branch August 23, 2023 19:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants