Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azidentity: sync.Cond.Wait is not being called in syncer #21151

Closed
bcho opened this issue Jul 12, 2023 · 1 comment · Fixed by #21162
Closed

azidentity: sync.Cond.Wait is not being called in syncer #21151

bcho opened this issue Jul 12, 2023 · 1 comment · Fixed by #21162
Assignees

Comments

@bcho
Copy link
Member

bcho commented Jul 12, 2023

Bug Report

  • import path of package in question: github.com/Azure/azure-sdk-for-go/sdk/azidentity
  • SDK version: latest
  • output of go version: go version go1.20.5 linux/amd64
  • What happened?

In sync.go implementation, we are using a sync.Cond to coordinate multiple goroutines access to the token value:

s.cond.L.Lock()
defer s.cond.L.Unlock()
for {
at, err = s.silent(ctx, opts)
if err == nil {
// got a token
break
}
if !s.authing {
// this goroutine will request a token
s.authing, auth = true, true
break
}
// another goroutine is acquiring a token; wait for it to finish, then try silent auth again
s.cond.Wait()
}
if auth {
s.authing = false
at, err = s.reqToken(ctx, opts)
s.cond.Broadcast()
}

But giving the sync.Cond is using sync.Mutex as lock, I don't think the s.cond.Wait will be called at all:

// another goroutine is acquiring a token; wait for it to finish, then try silent auth again
s.cond.Wait()

In cache miss case, the first goroutine enters the for-loop with obtaining s.authing and auth status, then break the loop. However, this goroutine will still hold the s.cond.L lock, as a result, other goroutines will be blocked in the s.cond.L.Lock() line, so no further calls will reach s.cond.Wait. We can confirm this behavior by checking the code path coverage from unit test.

  • What did you expect or want to happen?

I think it depends. From syncer's unit test:

if silentAuths != goroutines {
t.Errorf("expected %d silent auth attempts, got %d", goroutines, silentAuths)
}

It's expecting there will be N calls to the silent auth when there are N goroutines concurrently accessing the token. If this is the case, I think we can simplify the code to use a sync.Mutex only?

If that's not the case, I think we need to move the s.cond.L.Unlock right outside the for loop so other goroutines can enter the for loop to wait, like this:

        s.cond.L.Lock()
-       defer s.cond.L.Unlock()
	for {
		at, err = s.silent(ctx, opts)
		if err == nil {
			// got a token
			break
		}
		if !s.authing {
			// this goroutine will request a token
			s.authing, auth = true, true
			break
		}
		// another goroutine is acquiring a token; wait for it to finish, then try silent auth again
		s.cond.Wait()
	}
+	s.cond.L.Unlock()

	if auth {
		at, err = s.reqToken(ctx, opts)
+
+		s.cond.L.Lock()
+		s.authing = false
+		s.cond.Broadcast()
+		s.cond.L.Unlock()
	}

But this will increase calls to silent auth though.

  • How can we reproduce it?

Check with code path coverage from unit test.

  • Anything we should know about your environment.

  • Follow up question:

We also want to confirm if the access control here is for limiting all access to the token, even the request option is different. We are using this library in a first party environment, therefore, we will need to request tokens for different tenants or different scopes at the same time.

@github-actions github-actions bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jul 12, 2023
@jhendrixMSFT jhendrixMSFT added Azure.Identity and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Jul 12, 2023
@chlowell chlowell linked a pull request Jul 13, 2023 that will close this issue
@chlowell
Copy link
Member

Thanks for opening this! You're right, the condition is unused and therefore devolves to a mutex, so we may as well have just the mutex and simpler code.

But this will increase calls to silent auth though.

We'll still try silent auth once for every GetToken call, which is what we want because that's how we search the token cache.

We also want to confirm if the access control here is for limiting all access to the token, even the request option is different.

Currently we allow only one goroutine to authenticate with a credential instance at a time to avoid hitting a rate limit with redundant token requests. This enables sharing a credential instance--and thereby its token cache--with an arbitrary number of clients/goroutines.

We are using this library in a first party environment, therefore, we will need to request tokens for different tenants or different scopes at the same time.

I suppose it would be possible to synchronize goroutines more precisely by considering the parameters of the requested token. However, that would be much more complex than the current implementation and I imagine not much better for most applications. If the performance characteristics of the current implementation aren't acceptable for your application, I'd like to understand the bottlenecks through concrete perf data before jumping to a solution.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants