-
Notifications
You must be signed in to change notification settings - Fork 538
refactor!: introduce storage options accessor #5728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: introduce storage options accessor #5728
Conversation
Code ReviewP0: Compilation Error in TestsThe tests in New signatures: pub fn from_provider(provider: Arc<dyn StorageOptionsProvider>) -> Self
pub fn from_provider_with_initial(
provider: Arc<dyn StorageOptionsProvider>,
initial_options: HashMap<String, String>,
) -> SelfTest calls still have extra Duration argument: // These calls have 2 or 3 args, but should have 1 or 2
DynamicStorageOptionsCredentialProvider::from_provider(
mock.clone(),
Duration::from_secs(300), // <- Should be removed
);
DynamicStorageOptionsCredentialProvider::from_provider_with_initial(
mock.clone(),
Duration::from_secs(300), // <- Should be removed
initial_options,
);Similarly, Please verify this builds and tests pass before merging. P1: Breaking API Change Without Migration PathThe PR removes Consider adding a migration note to the release notes or emitting a warning if users try to use the old parameter name. Minor: Test CoverageThe new |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
- Fix s3_test.rs to use storage_options_accessor instead of storage_options - Fix formatting in python/src/file.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| /// | ||
| /// The returned accessor will always return the provided options. | ||
| /// This is useful when credentials don't expire or are managed externally. | ||
| pub fn static_options(options: HashMap<String, String>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does with_static_options give more uniformity re: naming?
Renamed StorageOptionsAccessor::static_options() to with_static_options() for naming consistency with with_provider() and with_initial_and_provider(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Same as lance-format#5547, but that one is too hard to rebase at this moment. For easier integration with distributed engines, we want a way to get `dataset.latest_storage_options()` through the initial static storage options + the dynamic storage options provider. However, current implementation pushes this logic down to AWS. This PR lifts the logic up into a component `StorageOptionsAccessor`, so that we can invoke it at dataset level, and also just wrap it as a CredentialsProvider for AWS. We also make `ObjectStoreParams` now hold a `StorageOptionsAccessor` instead of just the `storage_options` map. For user API, we continue to let them input storage options and storage options provider, `StorageOptionsAccessor` is used only internally. We also remove a few features added that turned out to be unnecessary in the whole credentials vending code path: 1. ignore_namespace_storage_options: this just never turns out to be possible to be set to true, I was overthinking in the beginning, so remove related code. 2. s3_credentials_refresh_offset_seconds: we overloaded this feature to set credentials refresh lead time for the storage options provider-based AWS credentials provider. But this is now not fitting the framework since the feature will be applicable to gcp and azure as well. Note that I removed all the related code in python and java. Although s3_credentials_refresh_offset exists for a long time, there was not really a way to set it in python and java. I added it for storage options provider only, but now it is no longer needed. Note that recently I added credentials vending server side support in DirectoryNamespace for all 3 clouds. So we can now provide a generic solution to vend credentials and test everything end to end for GCP and Azure. However this PR still keeps the feature in AWS, we will do another PR to add GCP and Azure support officially --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Same as lance-format#5547, but that one is too hard to rebase at this moment. For easier integration with distributed engines, we want a way to get `dataset.latest_storage_options()` through the initial static storage options + the dynamic storage options provider. However, current implementation pushes this logic down to AWS. This PR lifts the logic up into a component `StorageOptionsAccessor`, so that we can invoke it at dataset level, and also just wrap it as a CredentialsProvider for AWS. We also make `ObjectStoreParams` now hold a `StorageOptionsAccessor` instead of just the `storage_options` map. For user API, we continue to let them input storage options and storage options provider, `StorageOptionsAccessor` is used only internally. We also remove a few features added that turned out to be unnecessary in the whole credentials vending code path: 1. ignore_namespace_storage_options: this just never turns out to be possible to be set to true, I was overthinking in the beginning, so remove related code. 2. s3_credentials_refresh_offset_seconds: we overloaded this feature to set credentials refresh lead time for the storage options provider-based AWS credentials provider. But this is now not fitting the framework since the feature will be applicable to gcp and azure as well. Note that I removed all the related code in python and java. Although s3_credentials_refresh_offset exists for a long time, there was not really a way to set it in python and java. I added it for storage options provider only, but now it is no longer needed. Note that recently I added credentials vending server side support in DirectoryNamespace for all 3 clouds. So we can now provide a generic solution to vend credentials and test everything end to end for GCP and Azure. However this PR still keeps the feature in AWS, we will do another PR to add GCP and Azure support officially --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Same as lance-format#5547, but that one is too hard to rebase at this moment. For easier integration with distributed engines, we want a way to get `dataset.latest_storage_options()` through the initial static storage options + the dynamic storage options provider. However, current implementation pushes this logic down to AWS. This PR lifts the logic up into a component `StorageOptionsAccessor`, so that we can invoke it at dataset level, and also just wrap it as a CredentialsProvider for AWS. We also make `ObjectStoreParams` now hold a `StorageOptionsAccessor` instead of just the `storage_options` map. For user API, we continue to let them input storage options and storage options provider, `StorageOptionsAccessor` is used only internally. We also remove a few features added that turned out to be unnecessary in the whole credentials vending code path: 1. ignore_namespace_storage_options: this just never turns out to be possible to be set to true, I was overthinking in the beginning, so remove related code. 2. s3_credentials_refresh_offset_seconds: we overloaded this feature to set credentials refresh lead time for the storage options provider-based AWS credentials provider. But this is now not fitting the framework since the feature will be applicable to gcp and azure as well. Note that I removed all the related code in python and java. Although s3_credentials_refresh_offset exists for a long time, there was not really a way to set it in python and java. I added it for storage options provider only, but now it is no longer needed. Note that recently I added credentials vending server side support in DirectoryNamespace for all 3 clouds. So we can now provide a generic solution to vend credentials and test everything end to end for GCP and Azure. However this PR still keeps the feature in AWS, we will do another PR to add GCP and Azure support officially --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Same as #5547, but that one is too hard to rebase at this moment.
For easier integration with distributed engines, we want a way to get
dataset.latest_storage_options()through the initial static storage options + the dynamic storage options provider.However, current implementation pushes this logic down to AWS. This PR lifts the logic up into a component
StorageOptionsAccessor, so that we can invoke it at dataset level, and also just wrap it as a CredentialsProvider for AWS. We also makeObjectStoreParamsnow hold aStorageOptionsAccessorinstead of just thestorage_optionsmap.For user API, we continue to let them input storage options and storage options provider,
StorageOptionsAccessoris used only internally.We also remove a few features added that turned out to be unnecessary in the whole credentials vending code path:
Note that recently I added credentials vending server side support in DirectoryNamespace for all 3 clouds. So we can now provide a generic solution to vend credentials and test everything end to end for GCP and Azure. However this PR still keeps the feature in AWS, we will do another PR to add GCP and Azure support officially