Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_csv stopped working with s3 file system #34519

Closed
hellocoldworld opened this issue Jun 1, 2020 · 5 comments
Closed

BUG: read_csv stopped working with s3 file system #34519

hellocoldworld opened this issue Jun 1, 2020 · 5 comments
Assignees
Labels
Bug IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues

Comments

@hellocoldworld
Copy link

  • [yes ] I have checked that this issue has not already been reported.

  • [ yes ] I have confirmed this bug exists on the latest version of pandas.

  • [ yes] (optional) I have confirmed this bug exists on the master branch of pandas.
    Checked against pandas 1.1.0.dev0+1732.g2428cdda3

Problem description

read_csv in pandas1.0.4 has stopped working with s3fs.

On pandas1.0.3

import pandas as pd; import s3fs
s3fs.S3FileSystem(anon=False, key=os.environ.get("STORE_USERNAME"), secret=os.environ.get("STORE_PASSWORD"))
df = pd.read_csv(filepath_or_buffer="s3://my-private-bucket/my_dataframe.csv")
print(df.shape)

prints the correct output, whilst using pandas1.0.4 it raises the following exception

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nico/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/nico/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 431, in _read
    filepath_or_buffer, encoding, compression
  File "/home/nico/.local/lib/python3.7/site-packages/pandas/io/common.py", line 212, in get_filepath_or_buffer
    filepath_or_buffer, encoding=encoding, compression=compression, mode=mode
  File "/home/nico/.local/lib/python3.7/site-packages/pandas/io/s3.py", line 52, in get_filepath_or_buffer
    file, _fs = get_file_and_filesystem(filepath_or_buffer, mode=mode)
  File "/home/nico/.local/lib/python3.7/site-packages/pandas/io/s3.py", line 42, in get_file_and_filesystem
    file = fs.open(_strip_schema(filepath_or_buffer), mode)
  File "/home/nico/.local/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open
    **kwargs
  File "/home/nico/.local/lib/python3.7/site-packages/s3fs/core.py", line 378, in _open
    autocommit=autocommit, requester_pays=requester_pays)
  File "/home/nico/.local/lib/python3.7/site-packages/s3fs/core.py", line 1097, in __init__
    cache_type=cache_type)
  File "/home/nico/.local/lib/python3.7/site-packages/fsspec/spec.py", line 1065, in __init__
    self.details = fs.info(path)
  File "/home/nico/.local/lib/python3.7/site-packages/s3fs/core.py", line 530, in info
    Key=key, **version_id_kw(version_id), **self.req_kw)
  File "/home/nico/.local/lib/python3.7/site-packages/s3fs/core.py", line 200, in _call_s3
    return method(**additional_kwargs)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/client.py", line 622, in _make_api_call
    operation_model, request_dict, request_context)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/client.py", line 641, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/home/nico/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials

Output of pd.show_versions()

using pandas 1.0.3

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-46-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : es_AR.UTF-8 LOCALE : es_AR.UTF-8

pandas : 1.0.3
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 9.0.1
setuptools : 39.0.1
Cython : None
pytest : 5.4.0
hypothesis : None
sphinx : 1.6.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.0
pyxlsb : None
s3fs : 0.4.2
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

using pandas 1.0.4

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-46-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : es_AR.UTF-8 LOCALE : es_AR.UTF-8

pandas : 1.0.4
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 9.0.1
setuptools : 39.0.1
Cython : None
pytest : 5.4.0
hypothesis : None
sphinx : 1.6.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.0
pyxlsb : None
s3fs : 0.4.2
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@hellocoldworld hellocoldworld added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 1, 2020
@hellocoldworld hellocoldworld changed the title BUG: BUG: read_csv stopped working with s3 file system Jun 1, 2020
@jbrockmendel jbrockmendel added IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2020
@alimcmaster1 alimcmaster1 self-assigned this Jun 6, 2020
@alimcmaster1
Copy link
Member

alimcmaster1 commented Jun 6, 2020

Hi, thanks for the report, does this work with an AWS Credentials File? We made this throw an error when no credentials are defined as per #32486 .

Can you try reading the csv like #16692 (comment) ?

@simonjayhawkins
Copy link
Member

@alimcmaster1 @jorisvandenbossche is this a regression in 1.0.4?

@TomAugspurger
Copy link
Contributor

#34626 and this are duplicates. There's discussion over there, so closing this.

@simonjayhawkins it is a regression in 1.0.4, but I don't think a fix has been implemented yet. I'm not sure that we should delay the 1.0.5 release for this.

@alimcmaster1
Copy link
Member

#34626 and this are duplicates. There's discussion over there, so closing this.

@simonjayhawkins it is a regression in 1.0.4, but I don't think a fix has been implemented yet. I'm not sure that we should delay the 1.0.5 release for this.

#34632 - will have fixed this for 1.0.5 @TomAugspurger

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Projects
None yet
Development

No branches or pull requests

5 participants