Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: use buffered channels and only let one subrequest write a result #242

Merged
merged 1 commit into from
Nov 1, 2021

Conversation

ecordell
Copy link
Contributor

@ecordell ecordell commented Nov 1, 2021

Fixes #235

Once we’re in a “hedged” case, we have two goroutines running, and this select reading from both response channels for answers:

select {
case <-responseReady:
  duration = timeSource.Since(originalStart)
case <-hedgedResponseReady:
  duration = timeSource.Since(hedgedStart)
}

Once we get an answer from one of the two, the hedger unblocks and cancels the context that is shared by both subrequest goroutines.

This is what a subrequest looks like for reference:

func(ctx context.Context, responseReady chan<- struct{}) {
  rev, err = hp.delegate.Revision(ctx)
  responseReady <- struct{}{}
}

So if there are two of these running, the context is cancelled once the fastest one returns and the hedger func exits, and there’s nothing to read from the “slow” channel, so the subrequest goroutine hangs on send.

There are two parts to the fix:

  • using buffered "ready" channels for hedger will keep the subrequest goroutines from blocking on send
  • since both subrequests capture the parent return values and attempt to write to them, there's a potential data race. they're protected with sync.Once to avoid this problem

The tests were updated to detect the leak (and confirm it no longer happens)

@ecordell ecordell requested a review from jakedt November 1, 2021 19:14
@github-actions github-actions bot added the area/datastore Affects the storage system label Nov 1, 2021
@@ -213,7 +213,7 @@ func (cds *crdbDatastore) Revision(ctx context.Context) (datastore.Revision, err
}

func (cds *crdbDatastore) SyncRevision(ctx context.Context) (datastore.Revision, error) {
ctx, span := tracer.Start(ctx, "SyncRevision")
ctx, span := tracer.Start(datastore.SeparateContextWithTracing(ctx), "SyncRevision")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seemed like an oversight - any reason we wouldn't want a separate context here?

@github-actions github-actions bot added the area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) label Nov 1, 2021
@ecordell ecordell marked this pull request as ready for review November 1, 2021 19:44
@ecordell ecordell marked this pull request as draft November 1, 2021 19:56
@ecordell ecordell marked this pull request as ready for review November 1, 2021 20:04
@jzelinskie jzelinskie added kind/bug Something is broken or regressed priority/1 high This is the top priority labels Nov 1, 2021
internal/datastore/proxy/hedging.go Outdated Show resolved Hide resolved
@jakedt jakedt merged commit 36d0385 into authzed:main Nov 1, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Nov 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) kind/bug Something is broken or regressed priority/1 high This is the top priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Goroutine leak possibly in hedging middleware
3 participants