-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI: Add support for AWS named profiles #6161
Conversation
* build with object_store aws_profiles feature * use object_store 0.5.5 to include aws profile support for static credentials * update docs
I wonder if we could make this a feature flag on datafusion-cli, disabled by default? It is a very heavy feature to always enable, it brings in an entire AWS SDK which I suspect not all users will want? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mr-brobot
I agree with tustvold making this an optional datafusion-cli
feature, (enabled by default) might be nice (e.g. the 500+ lines of new stuff in datafusion-cli/Cargo.lock). However, it seems fine to me as well to merge like this given datafusion-cli is mostly a testing / development tool
Let's see if anyone else has additional thoughts
Thank you for the feedback, @tustvold and @alamb! I agree that we should limit the CLI size, and this feature adds weight (+4 MB) that isn't always useful, especially for non-AWS users. However, the CLI already includes both GCP and AWS features from object_store by default, and folks usually use one or the other (or neither for local files). If minimizing CLI size is a priority, it might make sense to add broader One implication is how users would opt-in to CLI features. No changes required for support in Cargo installation, and very simple to add Docker support via build args. However, I think package managers like Homebrew would only support the default features? If we decide on optional CLI features, I will add supporting Docker build args and related installation docs. 😄 |
Care has been taken upstream to ensure that the cloud provider features do not bring in any additional features beyond an HTTP client, see apache/arrow-rs#2176, there is therefore likely to be limited benefit from adding per-provider feature flags. The aws_profile feature was added as an escape valve, but as documented it comes with some not insignificant drawbacks. Despite being the person to add this functionality to object_store, I would still encourage people to use a solution like aws-vault, that doesn't store credentials in plain text on disk.
That would be wonderful, although I am curious how profiles work with docker given they rely on a home directory |
@tustvold Thank you for that explanation!
We'll need to mount the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mr-brobot
Here is my proposal: add a single feature to the datafusion-cli tool called cloud_object_store
This feature is enabled by default
When cloud_object_store
is enabled, use
object_store = { version = "0.5.5", features = ["aws", "gcp", "aws_profile"] }
When cloud_object_store
is not enabled, use
object_store = { version = "0.5.5" }
I still would prefer aws_profile not be enabled by default, but if others feel strongly I'm not going to stand in the way of that |
As I understood the issue is that AWS suggests to users to use that auth mode so it will be common for anyone trying to use datafusion-cli with AWS... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, you've convinced me, I guess we can't unilaterally fix AWS's frankly bananas auth story 😅
apache/arrow-rs#4163 might longer-term be an alternative way to handle this, and would avoid issues like apache/arrow-rs#4137 where object_store doesn't behave in the same way as the AWS CLI |
@alamb @tustvold Thank you for the thoughtful review! Just want to make sure no one is waiting on me. At one point an optional Also, there are some failing checks but looks like a transient network error? |
I think we should just merge this one and I will do so over the next day or two if there are no other reviews
Yeah, not sure what happened there - I restarted the tests. |
Thanks again for your help @mr-brobot |
Which issue does this PR close?
Closes #6157.
Rationale for this change
See #6157.
What changes are included in this PR?
aws_profile
feature of object_storeAWS_PROFILE
with S3 data sourcesAre these changes tested?
I didn't include any tests because this is a feature of a dependency that has it's own tests in the Arrow repo. Happy to add tests here if you think it necessary. :)
Are there any user-facing changes?
Yes, just another option for supplying AWS credentials. Docs updated.