Skip to content

read_csv encode operation on source #29233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amcoder-mb opened this issue Oct 26, 2019 · 5 comments · Fixed by #30246
Closed

read_csv encode operation on source #29233

amcoder-mb opened this issue Oct 26, 2019 · 5 comments · Fixed by #30246
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@amcoder-mb
Copy link

amcoder-mb commented Oct 26, 2019

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.read_csv("historical_dataset.csv")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1695, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 402, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 718, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'historical_dataset.csv' does not exist

Problem description

Noticed that when source does not exist, there is a leading 'b' on the source on the error message. For Python3, is it necessary to do the encode operation on line 667 of https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/parsers.pyx? Works fine for Python2.

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 5.0.0-31-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.14.5
pytz : 2018.5
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : 0.29.13
pytest : None
hypothesis : None
sphinx : 1.7.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.2.3
html5lib : None
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.2.3
matplotlib : 2.2.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.3.8
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

</details>
@jbrockmendel jbrockmendel added the IO CSV read_csv, to_csv label Oct 26, 2019
@gfyoung gfyoung added the Error Reporting Incorrect or improved errors from pandas label Oct 29, 2019
@gfyoung
Copy link
Member

gfyoung commented Oct 29, 2019

@abellerarj : The reason for the b is that we encode the name of the file to account for special characters cross-OS (xref: #24758). In Python2, the visual representation of encoded and non-encoded strings looked one and the same, whereas in Python3, a distinction is made visually.

@gfyoung
Copy link
Member

gfyoung commented Oct 29, 2019

Now for the error message, we don't necessarily have to pass in the encoded version. We could use the unencoded one instead. You are welcome to investigate.

@jbrockmendel
Copy link
Member

@gfyoung to get the non-prefixed filename, would we need to reverse the encoding done in #24758? that is now done in a .c file that i dont want to futz with

@gfyoung
Copy link
Member

gfyoung commented Dec 12, 2019

would we need to reverse the encoding done in #24758

That would a be logical first step to investigate.

@gfyoung gfyoung closed this as completed Dec 12, 2019
@gfyoung
Copy link
Member

gfyoung commented Dec 12, 2019

Fat fingers, my bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants