azidentity: sync.Cond.Wait is not being called in syncer #21151

bcho · 2023-07-12T21:18:14Z

Bug Report

import path of package in question: github.com/Azure/azure-sdk-for-go/sdk/azidentity
SDK version: latest
output of go version: go version go1.20.5 linux/amd64

What happened?

In sync.go implementation, we are using a sync.Cond to coordinate multiple goroutines access to the token value:

azure-sdk-for-go/sdk/azidentity/syncer.go

Lines 59 to 79 in 246ae74

 s.cond.L.Lock() 

 defer s.cond.L.Unlock() 

 for { 

 at, err = s.silent(ctx, opts) 

 if err == nil { 

 // got a token 

 break 

 } 

 if !s.authing { 

 // this goroutine will request a token 

 s.authing, auth = true, true 

 break 

 } 

 // another goroutine is acquiring a token; wait for it to finish, then try silent auth again 

 s.cond.Wait() 

 } 

 if auth { 

 s.authing = false 

 at, err = s.reqToken(ctx, opts) 

 s.cond.Broadcast() 

 }

But giving the sync.Cond is using sync.Mutex as lock, I don't think the s.cond.Wait will be called at all:

azure-sdk-for-go/sdk/azidentity/syncer.go

Lines 72 to 73 in 99ae59d

 // another goroutine is acquiring a token; wait for it to finish, then try silent auth again 

 s.cond.Wait()

In cache miss case, the first goroutine enters the for-loop with obtaining s.authing and auth status, then break the loop. However, this goroutine will still hold the s.cond.L lock, as a result, other goroutines will be blocked in the s.cond.L.Lock() line, so no further calls will reach s.cond.Wait. We can confirm this behavior by checking the code path coverage from unit test.

What did you expect or want to happen?

I think it depends. From syncer's unit test:

azure-sdk-for-go/sdk/azidentity/syncer_test.go

Lines 101 to 103 in 246ae74

 if silentAuths != goroutines { 

 t.Errorf("expected %d silent auth attempts, got %d", goroutines, silentAuths) 

 }

It's expecting there will be N calls to the silent auth when there are N goroutines concurrently accessing the token. If this is the case, I think we can simplify the code to use a sync.Mutex only?

If that's not the case, I think we need to move the s.cond.L.Unlock right outside the for loop so other goroutines can enter the for loop to wait, like this:

        s.cond.L.Lock()
-       defer s.cond.L.Unlock()
	for {
		at, err = s.silent(ctx, opts)
		if err == nil {
			// got a token
			break
		}
		if !s.authing {
			// this goroutine will request a token
			s.authing, auth = true, true
			break
		}
		// another goroutine is acquiring a token; wait for it to finish, then try silent auth again
		s.cond.Wait()
	}
+	s.cond.L.Unlock()

	if auth {
		at, err = s.reqToken(ctx, opts)
+
+		s.cond.L.Lock()
+		s.authing = false
+		s.cond.Broadcast()
+		s.cond.L.Unlock()
	}

But this will increase calls to silent auth though.

How can we reproduce it?

Check with code path coverage from unit test.

Anything we should know about your environment.
Follow up question:

We also want to confirm if the access control here is for limiting all access to the token, even the request option is different. We are using this library in a first party environment, therefore, we will need to request tokens for different tenants or different scopes at the same time.

The text was updated successfully, but these errors were encountered:

chlowell · 2023-07-13T22:42:06Z

Thanks for opening this! You're right, the condition is unused and therefore devolves to a mutex, so we may as well have just the mutex and simpler code.

But this will increase calls to silent auth though.

We'll still try silent auth once for every GetToken call, which is what we want because that's how we search the token cache.

We also want to confirm if the access control here is for limiting all access to the token, even the request option is different.

Currently we allow only one goroutine to authenticate with a credential instance at a time to avoid hitting a rate limit with redundant token requests. This enables sharing a credential instance--and thereby its token cache--with an arbitrary number of clients/goroutines.

We are using this library in a first party environment, therefore, we will need to request tokens for different tenants or different scopes at the same time.

I suppose it would be possible to synchronize goroutines more precisely by considering the parameters of the requested token. However, that would be much more complex than the current implementation and I imagine not much better for most applications. If the performance characteristics of the current implementation aren't acceptable for your application, I'd like to understand the bottlenecks through concrete perf data before jumping to a solution.

github-actions bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jul 12, 2023

jhendrixMSFT added Azure.Identity and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Jul 12, 2023

jhendrixMSFT assigned chlowell Jul 12, 2023

chlowell linked a pull request Jul 13, 2023 that will close this issue

Simplify GetToken synchronization #21162

Merged

chlowell closed this as completed in #21162 Jul 13, 2023

github-actions bot locked and limited conversation to collaborators Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azidentity: sync.Cond.Wait is not being called in syncer #21151

azidentity: sync.Cond.Wait is not being called in syncer #21151

bcho commented Jul 12, 2023

chlowell commented Jul 13, 2023

azidentity: sync.Cond.Wait is not being called in syncer #21151

azidentity: sync.Cond.Wait is not being called in syncer #21151

Comments

bcho commented Jul 12, 2023

Bug Report

chlowell commented Jul 13, 2023