Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV inside ZIP which is not UTF-8 encoded causes UnicodeDecodeError #74

Closed
craiga opened this issue Sep 16, 2020 · 0 comments · Fixed by #75
Closed

CSV inside ZIP which is not UTF-8 encoded causes UnicodeDecodeError #74

craiga opened this issue Sep 16, 2020 · 0 comments · Fixed by #75

Comments

@craiga
Copy link
Contributor

craiga commented Sep 16, 2020

pyexcel-io is assuming all files within a CSVZ file are UTF-8 encoded.

To demonstrate the issue, this zip file contains one CSV file which is UTF-32 encoded.

Passing it through pyexcel yields the following error:

  …
  File "…/views/upload_spreadsheets.py", line 67, in save_files
    yield (file.name, dict(self.save_book(file.get_book(), share_with_org)))
  File "…/site-packages/pyexcel_webio/__init__.py", line 203, in get_book
    return pe.get_book(**params)
  File "…/site-packages/pyexcel/core.py", line 47, in get_book
    book_stream = sources.get_book_stream(**keywords)
  File "…/site-packages/pyexcel/internal/core.py", line 39, in get_book_stream
    sheets = a_source.get_data()
  File "…/site-packages/pyexcel/plugins/sources/memory_input.py", line 40, in get_data
    sheets = self.__parser.parse_file_content(
  File "…/site-packages/pyexcel/plugins/parsers/excel.py", line 27, in parse_file_content
    return self._parse_any(
  File "…/site-packages/pyexcel/plugins/parsers/excel.py", line 40, in _parse_any
    sheets = get_data(anything, file_type=file_type, **keywords)
  File "…/site-packages/pyexcel_io/io.py", line 72, in get_data
    data, _ = _get_data(
  File "…/site-packages/pyexcel_io/io.py", line 91, in _get_data
    return load_data(**keywords)
  File "…/site-packages/pyexcel_io/io.py", line 216, in load_data
    result = reader.read_all()
  File "…/site-packages/pyexcel_io/book.py", line 157, in read_all
    result[sheet.name] = self.read_sheet(sheet)
  File "…/site-packages/pyexcel_io/readers/csvz.py", line 46, in read_sheet
    sheet = StringIO(content.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant