Skip to content

read_csv(filename_with_asian_locale) failed in python 3.6 for windows #16602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mfmain opened this issue Jun 5, 2017 · 3 comments
Closed

read_csv(filename_with_asian_locale) failed in python 3.6 for windows #16602

mfmain opened this issue Jun 5, 2017 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv Unicode Unicode strings

Comments

@mfmain
Copy link

mfmain commented Jun 5, 2017

Code:

Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
>>> pd.__version__
'0.20.1'
>>> import platform
>>> platform.platform()
'Windows-7-6.1.7601-SP1'
>>> import pandas as pd
>>> df = pd.read_csv(r'c:\tmp\中文.csv')
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-0cd6317422e5>", line 1, in <module>
    df = pd.read_csv(r'c:\tmp\中文.csv')
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 405, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 762, in __init__
    self._make_engine(self.engine)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 966, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1582, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 394, in pandas._libs.parsers.TextReader.__cinit__ (pandas\_libs\parsers.c:4209)
  File "pandas\_libs\parsers.pyx", line 712, in pandas._libs.parsers.TextReader._setup_parser_source (pandas\_libs\parsers.c:8895)
OSError: Initializing from file failed

Problem description

python 3.6 changed sys.getfilesystemencoding() to return "utf-8" instead of "mbcs"
see PEP 529.

How to fix

Here is the problem: parsers.pyx

if isinstance(source, basestring):
     if not isinstance(source, bytes):
         source = source.encode(sys.getfilesystemencoding() or 'utf-8')

the source parameter is our filename, and will be encoded to 'utf-8', not legacy 'mbcs' in python 3.6
and finally passed to open() in io.c:new_file_source
thus interpreted as a mbcs string, so, the "File not found" exception is not suprised
maybe this should be the responsiblity of cython for python 3.6 to handle these things by using unicode version of windows API,
but for now, we just replace sys.getfilesystemencoding() to "mbcs"

@mfmain mfmain changed the title read_csv(filename_with_asian_locale) failed in python 3.6 read_csv(filename_with_asian_locale) failed in python 3.6 for windows Jun 5, 2017
@mfmain
Copy link
Author

mfmain commented Jun 5, 2017

there is a workaround with speed compromised:
python df = pd.read_csv(r'c:\tmp\中文.csv', engine='python')

but it is a dirty work to modify every single call to read_csv in all your projects

@jreback
Copy link
Contributor

jreback commented Jun 5, 2017

this is a duplicate of this: #15086

there is a PR attached but unfortunately it was blown away.

certainly would take a fix for this.

@jreback jreback closed this as completed Jun 5, 2017
@jreback jreback added Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv Unicode Unicode strings labels Jun 5, 2017
@jreback jreback added this to the No action milestone Jun 5, 2017
@yuquant
Copy link

yuquant commented Mar 21, 2018

文件名不要用中文名,要改成英文。
Do NOT use the Chinese in the file name,change it to English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

3 participants