Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add credential provider utility classes for AWS, GCP #19297

Merged
merged 16 commits into from
Oct 18, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Oct 18, 2024

ref #19271 (comment)

  • Adds CredentialProviderAWS, CredentialProviderGCP

CredentialProviderAWS can be used to select a different AWS profile:

lf = pl.scan_parquet(
    "s3://...",
    credential_provider=pl.CredentialProviderAWS(profile_name="..."),
)

# `CredentialProviderAWS()` will also pick up `AWS_PROFILE` from the environment
import os
os.environ['AWS_PROFILE'] = 'dummy'

lf = pl.scan_parquet(
    "s3://...",
    credential_provider=pl.CredentialProviderAWS(),
)

Todos:

  • In local testing, GCP is not being cached properly

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 18, 2024
@nameexhaustion
Copy link
Collaborator Author

nameexhaustion commented Oct 18, 2024

Here are the differences with enabling the "auto" behavior compared to existing:

  • If boto3 is installed, the Python-side CredentialProviderAWS will be used by default for S3 URLs. CredentialProviderAWS introduces a new effect of respecting the AWS_PROFILE environment variable if it is set, otherwise it will behave identically to what we currently already do.
  • If google-auth is installed, the Python-side CredentialProviderGCP will be used by default for GS URLs. It behaves identically to what object-store already does, except in the error case the message is slightly more informative:
    • (CredentialProviderGCP) google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.
    • (object-store (existing)) polars.exceptions.ComputeError: Generic GCS error: Error performing token request: Error after 2 retries in 3.251593125s, max_retries:2, retry_timeout:10s, source:error sending request for url (http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token?audience=https%3A%2F%2Fwww.googleapis.com%2Foauth2%2Fv4%2Ftoken)

@nameexhaustion nameexhaustion marked this pull request as ready for review October 18, 2024 12:23
@nameexhaustion nameexhaustion marked this pull request as draft October 18, 2024 13:10
@nameexhaustion nameexhaustion marked this pull request as ready for review October 18, 2024 13:11

Cloud Credentials
~~~~~~~~~~~~~~~~~
Configuration for cloud credential provisioning.
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added to the Python docs, but everything has been marked unstable

@ritchie46
Copy link
Member

Nice, we should quickly setup an azure env as well.

@ritchie46 ritchie46 merged commit a3401dc into pola-rs:main Oct 18, 2024
25 of 26 checks passed
@c-peters c-peters added the accepted Ready for implementation label Oct 21, 2024
@nameexhaustion nameexhaustion deleted the creds-aws branch October 28, 2024 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants