Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: Add support for AWS named profiles #6161

Merged
merged 2 commits into from
May 4, 2023

Conversation

mr-brobot
Copy link
Contributor

Which issue does this PR close?

Closes #6157.

Rationale for this change

See #6157.

What changes are included in this PR?

  • Enabled aws_profile feature of object_store
  • Upgrade to object_store 0.5.5 to include fix related to AWS profiles with static credentials
  • Update CLI docs related to using AWS_PROFILE with S3 data sources

Are these changes tested?

I didn't include any tests because this is a feature of a dependency that has it's own tests in the Arrow repo. Happy to add tests here if you think it necessary. :)

Are there any user-facing changes?

Yes, just another option for supplying AWS credentials. Docs updated.

* build with object_store aws_profiles feature
* use object_store 0.5.5 to include aws profile support for static credentials
* update docs
@tustvold
Copy link
Contributor

tustvold commented Apr 29, 2023

I wonder if we could make this a feature flag on datafusion-cli, disabled by default? It is a very heavy feature to always enable, it brings in an entire AWS SDK which I suspect not all users will want?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mr-brobot

I agree with tustvold making this an optional datafusion-cli feature, (enabled by default) might be nice (e.g. the 500+ lines of new stuff in datafusion-cli/Cargo.lock). However, it seems fine to me as well to merge like this given datafusion-cli is mostly a testing / development tool

Let's see if anyone else has additional thoughts

@mr-brobot
Copy link
Contributor Author

mr-brobot commented Apr 30, 2023

Thank you for the feedback, @tustvold and @alamb!

I agree that we should limit the CLI size, and this feature adds weight (+4 MB) that isn't always useful, especially for non-AWS users.

However, the CLI already includes both GCP and AWS features from object_store by default, and folks usually use one or the other (or neither for local files). If minimizing CLI size is a priority, it might make sense to add broader aws and gcp features allowing users to opt-in to cloud support, for only the cloud they need.

One implication is how users would opt-in to CLI features. No changes required for support in Cargo installation, and very simple to add Docker support via build args. However, I think package managers like Homebrew would only support the default features?

If we decide on optional CLI features, I will add supporting Docker build args and related installation docs. 😄

@tustvold
Copy link
Contributor

tustvold commented Apr 30, 2023

However, the CLI already includes both GCP and AWS features from object_store by default,

Care has been taken upstream to ensure that the cloud provider features do not bring in any additional features beyond an HTTP client, see apache/arrow-rs#2176, there is therefore likely to be limited benefit from adding per-provider feature flags.

The aws_profile feature was added as an escape valve, but as documented it comes with some not insignificant drawbacks. Despite being the person to add this functionality to object_store, I would still encourage people to use a solution like aws-vault, that doesn't store credentials in plain text on disk.

I will add supporting Docker build args and related installation docs

That would be wonderful, although I am curious how profiles work with docker given they rely on a home directory

@mr-brobot
Copy link
Contributor Author

@tustvold Thank you for that explanation! aws-vault seems like the more secure approach, and I'll probably start using that personally. However, as you probably already know, as long as AWS docs/tools direct users to configure named profiles, AWS users (e.g., me) will bring their silly profile configs to the DataFusion CLI. 😓

That would be wonderful, although I am curious how profiles work with docker given they rely on a home directory

We'll need to mount the host .aws directory and set the AWS_PROFILE env var in the docker run command. I'll implement the optional feature + Docker support later tonight/tomorrow. 😄

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mr-brobot

Here is my proposal: add a single feature to the datafusion-cli tool called cloud_object_store

This feature is enabled by default

When cloud_object_store is enabled, use

object_store = { version = "0.5.5", features = ["aws", "gcp", "aws_profile"] }

When cloud_object_store is not enabled, use

object_store = { version = "0.5.5" }

@tustvold
Copy link
Contributor

tustvold commented May 1, 2023

I still would prefer aws_profile not be enabled by default, but if others feel strongly I'm not going to stand in the way of that

@alamb
Copy link
Contributor

alamb commented May 1, 2023

I still would prefer aws_profile not be enabled by default, but if others feel strongly I'm not going to stand in the way of that

As I understood the issue is that AWS suggests to users to use that auth mode so it will be common for anyone trying to use datafusion-cli with AWS...

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, you've convinced me, I guess we can't unilaterally fix AWS's frankly bananas auth story 😅

@tustvold
Copy link
Contributor

tustvold commented May 2, 2023

apache/arrow-rs#4163 might longer-term be an alternative way to handle this, and would avoid issues like apache/arrow-rs#4137 where object_store doesn't behave in the same way as the AWS CLI

@mr-brobot
Copy link
Contributor Author

@alamb @tustvold Thank you for the thoughtful review!

Just want to make sure no one is waiting on me. At one point an optional cloud_object_store feature was proposed, but since then you both approved. If you'd like me to still implement that, just let me know!

Also, there are some failing checks but looks like a transient network error? 502 Gateway Error

@alamb
Copy link
Contributor

alamb commented May 2, 2023

Just want to make sure no one is waiting on me. At one point an optional cloud_object_store feature was proposed, but since then you both approved. If you'd like me to still implement that, just let me know!

I think we should just merge this one and I will do so over the next day or two if there are no other reviews

Also, there are some failing checks but looks like a transient network error? 502 Gateway Error

Yeah, not sure what happened there - I restarted the tests.

@alamb
Copy link
Contributor

alamb commented May 2, 2023

Thanks again for your help @mr-brobot

@alamb alamb merged commit c615df8 into apache:main May 4, 2023
@andygrove andygrove added the enhancement New feature or request label May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLI: Support named profiles via AWS_PROFILE environment variable
4 participants