Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure: Don't panic when token refresh fails #391

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

jshearer
Copy link
Contributor

@jshearer jshearer commented Jun 27, 2024

Instead of crashing the whole broker when we fail to refresh an azblob.TokenCredential, we should retry forever and log noisily instead.

  • Get this running in my local Flow instance and figure out how to cause this particular type of error and validate that it retries and backs off correctly

This change is Reviewable

Instead of crashing the whole broker when we fail to refresh an `azblob.TokenCredential`, we should retry forever and log noisily instead.
@jshearer
Copy link
Contributor Author

So, I validated that this change does what it claims to: if there's an error fetching/refreshing an access token for our service account, instead of panicking we will log noisily and retry in an exponential backoff capped to once every 5 minutes. That being said... is this actually what we want to do in all cases? As I was trying to find ways to cause this error, I came up with 4 different possible causes:

  • We provide invalid AZURE_CLIENT_ID/AZURE_CLIENT_SECRET. This feels like we should panic so it's immediately obvious we messed up a config.
  • An azure-ad:// URI contains a tenant ID that doesn't exist. This feels like something that we shouldn't keep retrying, as it's likely a typo and that tenant will never exist. But how do we indicate a failure when NewTokenCredential takes a lambda that isn't allowed to return an error? Maybe the answer is to validate whether the tenant exists earlier in the sequence, in a place where we can return an error.
  • An azure-ad:// URI contains a tenant ID that hasn't authorized our service account. This feels fine to retry forever and log noisily
  • A network or internal error of some kind happens. This also feels fine to retry.

Think it's worth fleshing out validation to cover the first two cases here?


Confirmed that if I provide it invalid azure creds it fails in a noisy loop and backs off as designed:

time="2024-06-27T21:56:33-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=1s tenant=xxx
time="2024-06-27T21:56:35-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=2s tenant=xxx
time="2024-06-27T21:56:37-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=4s tenant=xxx
time="2024-06-27T21:56:41-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=8s tenant=xxx
time="2024-06-27T21:56:49-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=16s tenant=xxx
time="2024-06-27T21:57:05-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=32s tenant=xxx
time="2024-06-27T21:57:37-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential authentication failed. FromClientSecret(): http call(https://login.microsoftonline.com/xxx/oauth2/v2.0/token)(POST) error: reply status code was 401: {\"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'xxx'. To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret" backoff_duration=1m4s tenant=xxx

And also if you give it a bad tenant ID it'll also backoff and retry. I wasn't able to reproduce the original error that caused here (looked like a transient network or service error). Note that it backs off up to 5m and then no more.

time="2024-06-27T22:15:02-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=1s tenant=xxx
time="2024-06-27T22:15:03-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=2s tenant=xxx
time="2024-06-27T22:15:05-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=4s tenant=xxx
time="2024-06-27T22:15:10-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=8s tenant=xxx
time="2024-06-27T22:15:18-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=16s tenant=xxx
time="2024-06-27T22:15:34-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=32s tenant=xxx
time="2024-06-27T22:16:07-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=1m4s tenant=xxx
time="2024-06-27T22:17:11-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=2m8s tenant=xxx
time="2024-06-27T22:19:20-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=4m16s tenant=xxx
time="2024-06-27T22:23:36-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=5m0s tenant=xxx
time="2024-06-27T22:28:37-04:00" level=error msg="Error refreshing credential, will retry: ClientSecretCredential: Tenant 'xxx' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant." backoff_duration=5m0s tenant=xxx

@jshearer jshearer marked this pull request as ready for review June 28, 2024 02:37
@jshearer jshearer requested a review from jgraettinger June 28, 2024 03:12
Copy link
Contributor

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jshearer jshearer merged commit f7be722 into master Jul 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants