Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use s3fs authentication if provided #33639

Closed
cc-jj opened this issue Apr 18, 2020 · 2 comments
Closed

use s3fs authentication if provided #33639

cc-jj opened this issue Apr 18, 2020 · 2 comments
Assignees
Labels
IO CSV read_csv, to_csv

Comments

@cc-jj
Copy link

cc-jj commented Apr 18, 2020

I have a use case where I need to download dataframes from multiple s3 buckets with different credentials.

By default, s3fs uses env variables such as AWS_PROFILE AWS_ACCESS_KEY_ID etc to determine credentials. However, this will not work for me as I need different credentials for different buckets.

The s3fs docs show you can alternatively authenticate like so:
https://fs-s3fs.readthedocs.io/en/latest/#authentication

s3fs = open_fs('s3://<access key>:<secret key>@mybucket')

I attempted to use this idea with pandas

df = pd.read_csv("s3://<access key>:<secret key>@mybucket/csv_key")

but this raised an exception deep within s3fs saying invalid bucket name. potentially caused by stripping logic here:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/s3.py#L29

I think we could easily support authentication using this syntax:

pd.read_csv("s3://<access key>:<secret key>@mybucket/csv_key")

By modifying the code here:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/s3.py#L27

The idea being we first attempt to match the filepath_or_buffer for the access key and secret key. If matched, we pass these into s3fs.FileSystem

m = re.match(pattern, filepath_or_buffer)
if match is not None:
    access_key, secret_key, bucket_name = match.groups()
    fs = s3fs.FileSystem(bucket_name, aws_access_key_id=access_key, aws_secret_key=secret_key)
...
@cc-jj cc-jj changed the title use s3fs authorization if provided use s3fs authentication if provided Apr 18, 2020
@alimcmaster1 alimcmaster1 added the IO CSV read_csv, to_csv label Apr 18, 2020
@alimcmaster1
Copy link
Member

Thanks for the report.

Can you show the actual error you are getting?

Yep agree currently there is no way to pass kwargs through to s3fs.FileSystem. But should work fine to include credentials in the path.

Don't think the stripping logic you mentioned is problematic - it just removes "s3://"

-1 for using any regex matching to implement this though.

Related discussion on moving to fsspec - #33452

@alimcmaster1
Copy link
Member

#35381 closes this. You should now be able to use the storage_options kwarg to pass in "aws_access_key_id" & "aws_secret_key"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

2 participants