Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to specify AWS profile when saving parquet file with s3 url #19904

Closed
tomers opened this issue Feb 26, 2018 · 6 comments
Closed

Unable to specify AWS profile when saving parquet file with s3 url #19904

tomers opened this issue Feb 26, 2018 · 6 comments
Assignees
Labels
Enhancement IO Network Local or Cloud (AWS, GCS, etc.) IO Issues IO Parquet parquet, feather

Comments

@tomers
Copy link

tomers commented Feb 26, 2018

I would like to be able to specify the named profile to use when uploading a dataframe as parquet file to S3. This feature seems to be missing. The work-around is to specify the default account as the the account I would like to use, but this does not allow me to specify programmatically the account name to use.

Code Sample

$ python
Python 3.6.2 (default, Jul 17 2017, 16:44:45) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> 
>>> s3_url = 's3://my_bucket/foo/bar/example.parquet'
>>> df = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]})
>>> 
>>> df.to_parquet(s3_url, engine='fastparquet', profile='production')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/core/frame.py", line 1691, in to_parquet
    compression=compression, **kwargs)
  File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/io/parquet.py", line 248, in to_parquet
    return impl.write(df, path, compression=compression, **kwargs)
  File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/io/parquet.py", line 210, in write
    compression=compression, **kwargs)
TypeError: write() got an unexpected keyword argument 'profile'
>>> 

I also tried the profile_name argument. This feature seems to be missing.

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+273.gcd484cc52
pytest: 3.4.0
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.2
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Can you create the S3File manually? Something like

import s3fs

fs = s3fs.S3FileSystem(profile_name='production')
f = fs.open(s3_url, mode='wb')
df.to_parquet(s3_url, engine='fastparquet', profile='production')

We'll need a more general solution to handling this for all the readers / writers and all the various cloud systems. Something like dask's backend_kwargs.

@gfyoung
Copy link
Member

gfyoung commented Mar 2, 2018

@TomAugspurger : That would make the most sense, since it's generic enough.

@IvoMerchiers
Copy link

Related to the more general issue at #16692

@jbrockmendel jbrockmendel added the IO Network Local or Cloud (AWS, GCS, etc.) IO Issues label Dec 11, 2019
@alimcmaster1 alimcmaster1 self-assigned this Apr 19, 2020
@nivcoh
Copy link

nivcoh commented Aug 28, 2020

Can you create the S3File manually? Something like

import s3fs

fs = s3fs.S3FileSystem(profile_name='production')
f = fs.open(s3_url, mode='wb')
df.to_parquet(s3_url, engine='fastparquet', profile='production')

We'll need a more general solution to handling this for all the readers / writers and all the various cloud systems. Something like dask's backend_kwargs.

Hi @TomAugspurger
I upgraded s3fs from 0.1.5 to 0.5.0 and I keep getting:
init() got an unexpected keyword argument 'profile_name'
Any idea?

@TomAugspurger
Copy link
Contributor

fsspec/s3fs#324

I believe that this was properly fixed by #35381. The syntax would be df.to_parquet("s3://...", storage_options={"profile_name": "production"}) once that's fixed.

@drewfustin
Copy link
Contributor

in case someone shows up here, the kwarg fspec is looking for is "profile", so this should be df.to_parquet("s3://...", storage_options={"profile": "production"}).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Network Local or Cloud (AWS, GCS, etc.) IO Issues IO Parquet parquet, feather
Projects
None yet
Development

No branches or pull requests

8 participants