Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to support XLS and XLSX uploads in Python 3.9 #66

Open
craiga opened this issue Nov 16, 2020 · 3 comments
Open

Unable to support XLS and XLSX uploads in Python 3.9 #66

craiga opened this issue Nov 16, 2020 · 3 comments

Comments

@craiga
Copy link

craiga commented Nov 16, 2020

On Python 3.9 with pyexcel-xls and pyexcel-xlsx installed, I'm not able to upload .xslx files.

AttributeError: 'ElementTree' object has no attribute 'getiterator'
Saving workbook from spreadsheet.xlsx
Internal Server Error: /portfolios/my-portfolio/upload
Traceback (most recent call last):
    File "/app/.heroku/python/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
        response = get_response(request)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/core/handlers/base.py", line 179, in _get_response
        response = wrapped_callback(request, *callback_args, **callback_kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/sentry_sdk/integrations/django/views.py", line 67, in sentry_wrapped_callback
        return callback(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
        return self.dispatch(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/contrib/auth/mixins.py", line 85, in dispatch
        return super().dispatch(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/contrib/auth/mixins.py", line 52, in dispatch
        return super().dispatch(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/base.py", line 98, in dispatch
        return handler(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/edit.py", line 142, in post
        return self.form_valid(form)
    File "/app/portfolios/views/upload_spreadsheets.py", line 40, in form_valid
        result = dict(self.save_files(self.request.FILES.getlist("file")))
    File "/app/portfolios/views/upload_spreadsheets.py", line 55, in save_files
        yield (file.name, dict(self.save_book(file.get_book())))
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_webio/__init__.py", line 203, in get_book
        return pe.get_book(**params)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/core.py", line 47, in get_book
        book_stream = sources.get_book_stream(**keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/internal/core.py", line 39, in get_book_stream
        sheets = a_source.get_data()
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/sources/memory_input.py", line 40, in get_data
        sheets = self.__parser.parse_file_content(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/parsers/excel.py", line 27, in parse_file_content
        return self._parse_any(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/parsers/excel.py", line 40, in _parse_any
        sheets = get_data(anything, file_type=file_type, **keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 86, in get_data
        data, _ = _get_data(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 105, in _get_data
        return load_data(**keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 193, in load_data
        reader.open_content(file_content, **keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/reader.py", line 58, in open_content
        self.reader = self.reader_class(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 186, in __init__
        super().__init__(file_type, file_contents=file_content, **keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 146, in __init__
        self.xls_book = self.get_xls_book(**xlrd_params)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 167, in get_xls_book
        xls_book = xlrd.open_workbook(**xlrd_params)
    File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/__init__.py", line 130, in open_workbook
        bk = xlsx.open_workbook_2007_xml(
    File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/xlsx.py", line 812, in open_workbook_2007_xml
        x12book.process_stream(zflo, 'Workbook')
    File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/xlsx.py", line 266, in process_stream
        for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'

pyexcel appears to be preferring pyexcel-xls over pyexcel-xlsx for parsing xlsx files.

pyexcel-xls works fine for reading xls files, but the underlying (and unmaintained) xlrd library's XML parsing seems to rely on a method which has been removed from ElementTree. I haven't looked to far into this, but I did see this in what's new in Python 3.9:

Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

I tried solving this issue at pyexcel/pyexcel-io#99 with no luck.

@craiga
Copy link
Author

craiga commented Dec 29, 2020

@chfw
Copy link
Member

chfw commented Dec 29, 2020

Yep, please update to latest pyexcel-xls

@craiga
Copy link
Author

craiga commented Apr 7, 2021

Apologies for taking so long to get back to this.

Updating to the latest pyexcel-xls doesn't solve this problem. It's only when we're on the latest version of pyexcel-xls that we see the above error message (if I roll back to the previous version, I get xlrd.biffh.XLRDError: Excel xlsx file; not supported as XLRD is no longer pinned).

As far as I can tell, there are two possible solutions to this issue:

  • remove XLSX support from pyexcel-xls (as this is what XLRD has done it seems like the sensible approach to me)
  • somehow get lml to prefer pyexcel-xlsx for XLSX files (I looked into this but couldn't figure out how to make this happen)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants