Skip to content

Conversation

@jadewang-db
Copy link
Contributor

@jadewang-db jadewang-db commented May 23, 2025

Problem

The Databricks driver's CloudFetch functionality was not properly handling expired cloud file URLs, which could lead to failed downloads and errors during query execution. The system needed a way to track, cache, and refresh presigned URLs before they expire.

Solution

  • Improve CloudFetchResultFetcher class that:
    • Manages a cache of cloud file URLs with their expiration times
    • Proactively refreshes URLs that are about to expire
    • Provides thread-safe access to URL information
  • Added an IClock interface and implementations to facilitate testing with controlled time
  • Extended the IDownloadResult interface to support URL refreshing and expiration checking
  • Updated namespace from Apache.Arrow.Adbc.Drivers.Apache.Databricks.CloudFetch to Apache.Arrow.Adbc.Drivers.Databricks.CloudFetch for better organization

@jadewang-db jadewang-db force-pushed the handle-expired-cloudfile branch from 6968fc5 to 667239c Compare May 23, 2025 21:19
Use real CloudFetchUrlManager in tests instead of mock

Handle presigned URL expiration in CloudFetch

fix lint

Address review comments: remove unnecessary URL expiration check and GetFreshClient method

fix test

Refactor: Move URL management from CloudFetchUrlManager into CloudFetchResultFetcher

Update tests to use CloudFetchResultFetcher for URL management

Remove CloudFetchUrlManagerTest.cs as it's replaced by CloudFetchResultFetcherTest.cs

Remove duplicate clock definitions from CloudFetchUrlManager

Make CloudFetchResultFetcher non-sealed to allow extension for testing

Fix DateTime.ToUnixTimeMilliseconds conversion in test

Remove CloudFetchUrlManager class as its functionality is now in CloudFetchResultFetcher

lint fix

address comments

Update CloudFetchResultFetcher.cs
@jadewang-db jadewang-db force-pushed the handle-expired-cloudfile branch from 66d1898 to a6ba579 Compare May 28, 2025 16:57
@jadewang-db jadewang-db marked this pull request as ready for review May 28, 2025 17:02
@github-actions github-actions bot added this to the ADBC Libraries 19 milestone May 28, 2025
HttpCompletionOption.ResponseHeadersRead,
cancellationToken).ConfigureAwait(false);

// Check if the response indicates an expired URL (typically 403 or 401)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this a generic if-fail then retry attempt?

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Comments are mostly nitpicking.

@CurtHagenlocher CurtHagenlocher merged commit 317c9c9 into apache:main May 28, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants