Skip to content

Conversation

@alexguo-db
Copy link
Contributor

Motivation

In scenarios like PowerBI dataset refresh, if a query runs longer than the OAuth token's expiration time (typically 1 hour for AAD tokens), the connection fails. PowerBI only refreshes access tokens if they have less than 20 minutes of expiration time and never updates tokens after a connection is opened.

This PR implements token refresh functionality in the Databricks ADBC driver using the Databricks token exchange API. When an OAuth token is about to expire within a configurable time limit, the driver automatically exchanges it for a new token with a longer expiration time.

Key Components

  1. JWT Token Decoder: Parses JWT tokens to extract expiration time
  2. Token Exchange Client: Handles API calls to the Databricks token exchange endpoint
  3. Token Exchange Handler: HTTP handler that intercepts requests and refreshes tokens when needed

Changes

  • Added new connection string parameter adbc.databricks.token_renew_limit to control when token renewal happens
  • Implemented JWT token decoding to extract token expiration time
  • Created token exchange client to handle API calls to Databricks token exchange endpoint
  • Added HTTP handler to intercept requests and refresh tokens when needed
  • Updated connection handling to create and configure the token exchange components

Testing

  • Unit tests for JWT token decoding, token exchange client, and token exchange handler
  • End-to-end tests that verify token refresh functionality with real tokens
dotnet test --filter "FullyQualifiedName~JwtTokenDecoderTests"
dotnet test --filter "FullyQualifiedName~TokenExchangeClientTests"
dotnet test --filter "FullyQualifiedName~TokenExchangeDelegatingHandlerTests"

@github-actions github-actions bot added this to the ADBC Libraries 20 milestone Jul 18, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The change looks fine overall, but I had a lot of trouble following the logic in DatabricksConnection.CreateHttpHandler and I wonder whether there might be some small changes that would make it easier to understand.

Feel free to push back if you prefer to check in as-is.

baseHandler = new OAuthDelegatingHandler(baseHandler, tokenProvider);
}

// Add token exchange handler if token renewal is enabled and the auth type is OAuth access token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mutually exclusive with the if at line 340, yes? The former is for client_credentials while this is for a user token? I'm wondering whether there might be some way of making this more obvious in the code, because _httpClient is being initialized in both but differently and it looked a bit tricky to reason about. Perhaps this second if could instead be an else if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will make that change

Properties.TryGetValue(DatabricksParameters.OAuthScope, out string? scope);

HttpClient OauthHttpClient = new HttpClient(HiveServer2TlsImpl.NewHttpClientHandler(TlsOptions, _proxyConfigurator));
if (_httpClient == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this method be called correctly more than once, and if so, then under what circumstances? If it can't be, then there's no need for the null check here. It could be replaced e.g. by a Debug.Assert(_httpClient == null). This would make the method a little easier to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should only be called once per connection.
https://github.com/apache/arrow-adbc/blob/main/csharp/src/Drivers/Databricks/DatabricksDatabase.cs#L43
Specifically, it's called during OpenAsync, which is called right after initializing the DatabricksConnection

I added the change you suggested

HttpClient OauthHttpClient = new HttpClient(HiveServer2TlsImpl.NewHttpClientHandler(TlsOptions, _proxyConfigurator));
if (_httpClient == null)
{
_httpClient = new HttpClient(HiveServer2TlsImpl.NewHttpClientHandler(TlsOptions, _proxyConfigurator));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was already creating a handler before, but should the credential-related handlers be adding the tracing-related functionality and/or the retry functionality? Right now, tracing is only added to the primary (data) handler and not to either of the credential handlers and retry is being added only to the token exchange path. The code is complex enough that it's not clear whether this is deliberate or accidental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, I added the tracing and retry functionality to both auth HTTP clients

@alexguo-db
Copy link
Contributor Author

Please give a chance for @jadewang-db to take a look before merging, thanks

/// <summary>
/// Utility class for decoding JWT tokens and extracting claims.
/// </summary>
internal static class JwtTokenDecoder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no builtin jwt token decoder in dotnet, what about System.IdentityModel.Tokens.Jwt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.IdentityModel.Tokens.Jwt is not built-in, we need to add it as an external dependency. I remember last time when we added K4os.Compression.LZ4 dependency it took a long time to get security approval for builds, so not sure if we should add another dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, defer to @CurtHagenlocher on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have hand-rolled this as well.

{
// Add retry handler for 503 responses
baseHandler = new RetryHttpHandler(baseHandler, TemporarilyUnavailableRetryTimeout);
baseAuthHandler = new RetryHttpHandler(baseAuthHandler, TemporarilyUnavailableRetryTimeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently retryhttphandler only handle 503, is it enough? should we retry on timed out?

Copy link
Contributor Author

@alexguo-db alexguo-db Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes we should retry on other codes, 502, 504, 408. I will create a different retry delegating handler for Auth operations, since it seems different enough from the retry logic of thrift operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we can have a follow up PR to make current retry handler richer, I don't see they are conflict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, these are the differences:
Auth retry HTTP handler:

  • needs to retry on 502, 503, 504, 408
  • needs to use exponential backoff for retries

Thrift retry HTTP handler:

  • only retries on 503
  • relies on the returned Retry-After header to determine how long to wait

I think we can unify in a future PR so that for both auth and Thrift requests:

  • retries on 502, 503, 504, 408
  • if Retry-After header is set, use that to determine how long to wait
  • otherwise, use exponential backoff

@CurtHagenlocher CurtHagenlocher merged commit e5c25e1 into apache:main Jul 22, 2025
7 checks passed
CurtHagenlocher pushed a commit that referenced this pull request Jul 22, 2025
…er status codes (#3186)

Follow up of
#3177 (comment)

## Proposed Changes
- Added support for additional HTTP status codes: 408 (Request Timeout),
502 (Bad Gateway), and 504 (Gateway Timeout), in addition to the
existing 503 (Service Unavailable)
- Implemented exponential backoff with jitter when no Retry-After header
is present

## Testing
Unit tests
`dotnet test --filter "FullyQualifiedName~RetryHttpHandlerTest"`
```
[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v3.1.1+bf6400fd51 (64-bit .NET 8.0.7)
[xUnit.net 00:00:00.06]   Discovering: Apache.Arrow.Adbc.Tests.Drivers.Databricks
[xUnit.net 00:00:00.15]   Discovered:  Apache.Arrow.Adbc.Tests.Drivers.Databricks
[xUnit.net 00:00:00.16]   Starting:    Apache.Arrow.Adbc.Tests.Drivers.Databricks
[xUnit.net 00:00:25.28]   Finished:    Apache.Arrow.Adbc.Tests.Drivers.Databricks
  Apache.Arrow.Adbc.Tests.Drivers.Databricks test net8.0 succeeded (26.2s)

Test summary: total: 14, failed: 0, succeeded: 14, skipped: 0, duration: 26.2s
Build succeeded in 30.3s
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants