Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/Write files on specific S3 accounts #16692

Closed
amelio-vazquez-reina opened this issue Jun 13, 2017 · 6 comments
Closed

Read/Write files on specific S3 accounts #16692

amelio-vazquez-reina opened this issue Jun 13, 2017 · 6 comments
Assignees
Labels
Docs IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues

Comments

@amelio-vazquez-reina
Copy link
Contributor

amelio-vazquez-reina commented Jun 13, 2017

Say I want to save a file to S3 using a specific account:

df.to_csv('s3://foo/bar/temp.csv')

where my accounts are listed in ~/.aws/credentials:

[default]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

[foo]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

[bar]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

What's the best or recommended way to do this with Pandas 0.20.2?

Any way to use/specify what account to use when we have multiple of them?

Perhaps related: Does Pandas use boto or boto3?

@amelio-vazquez-reina amelio-vazquez-reina changed the title read_csv and to_csv on specific S3 account Read/Write files on specific S3 accounts Jun 13, 2017
@TomAugspurger
Copy link
Contributor

As of 0.20, pandas uses http://s3fs.readthedocs.io/en/latest/

I believe you should be able to do

import pandas as pd
import s3fs

fs = s3fs.S3FileSystem(profile_name='foo')

f = fs.open("my-bucket/file.csv", "wb")
df.to_csv(f)

Could you try that out, and if it works make a pull request for the documentation? I don't have a test bucket handy at the moment.

@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label Docs IO CSV read_csv, to_csv labels Jun 13, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jun 13, 2017
@eronsdc
Copy link

eronsdc commented Dec 11, 2018

I know this post is quite old at this point. However @TomAugspurger 's solution certainly works. For py3, I did the small change of using 'w' instead of 'wb'.

@shughes-uk
Copy link

Would a solution to this be allowing a dask style storage_options parameter on its read_x functions? It's a little frustrating not being able to just pass these things through, most frequently i'm trying to pass in credentials rather than let boto search my system for them.

@TomAugspurger
Copy link
Contributor

Yes, I think that request has come up in a few places. I'd be happy to see something like that.

@jbrockmendel jbrockmendel added IO Network Local or Cloud (AWS, GCS, etc.) IO Issues and removed IO Data IO issues that don't fit into a more specific label labels Dec 12, 2019
@alimcmaster1
Copy link
Member

Ref similar issue: #33639

@alimcmaster1
Copy link
Member

#35381 closes this. You should now be able to use the storage_options kwarg to pass in "profile"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Projects
None yet
Development

No branches or pull requests

6 participants