Skip to content

Unable to support XLS and XLSX uploads in Python 3.9 #66

Open
@craiga

Description

@craiga

On Python 3.9 with pyexcel-xls and pyexcel-xlsx installed, I'm not able to upload .xslx files.

AttributeError: 'ElementTree' object has no attribute 'getiterator'
Saving workbook from spreadsheet.xlsx
Internal Server Error: /portfolios/my-portfolio/upload
Traceback (most recent call last):
    File "/app/.heroku/python/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
        response = get_response(request)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/core/handlers/base.py", line 179, in _get_response
        response = wrapped_callback(request, *callback_args, **callback_kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/sentry_sdk/integrations/django/views.py", line 67, in sentry_wrapped_callback
        return callback(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
        return self.dispatch(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/contrib/auth/mixins.py", line 85, in dispatch
        return super().dispatch(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/contrib/auth/mixins.py", line 52, in dispatch
        return super().dispatch(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/base.py", line 98, in dispatch
        return handler(request, *args, **kwargs)
    File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/edit.py", line 142, in post
        return self.form_valid(form)
    File "/app/portfolios/views/upload_spreadsheets.py", line 40, in form_valid
        result = dict(self.save_files(self.request.FILES.getlist("file")))
    File "/app/portfolios/views/upload_spreadsheets.py", line 55, in save_files
        yield (file.name, dict(self.save_book(file.get_book())))
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_webio/__init__.py", line 203, in get_book
        return pe.get_book(**params)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/core.py", line 47, in get_book
        book_stream = sources.get_book_stream(**keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/internal/core.py", line 39, in get_book_stream
        sheets = a_source.get_data()
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/sources/memory_input.py", line 40, in get_data
        sheets = self.__parser.parse_file_content(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/parsers/excel.py", line 27, in parse_file_content
        return self._parse_any(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/parsers/excel.py", line 40, in _parse_any
        sheets = get_data(anything, file_type=file_type, **keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 86, in get_data
        data, _ = _get_data(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 105, in _get_data
        return load_data(**keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 193, in load_data
        reader.open_content(file_content, **keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/reader.py", line 58, in open_content
        self.reader = self.reader_class(
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 186, in __init__
        super().__init__(file_type, file_contents=file_content, **keywords)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 146, in __init__
        self.xls_book = self.get_xls_book(**xlrd_params)
    File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 167, in get_xls_book
        xls_book = xlrd.open_workbook(**xlrd_params)
    File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/__init__.py", line 130, in open_workbook
        bk = xlsx.open_workbook_2007_xml(
    File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/xlsx.py", line 812, in open_workbook_2007_xml
        x12book.process_stream(zflo, 'Workbook')
    File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/xlsx.py", line 266, in process_stream
        for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'

pyexcel appears to be preferring pyexcel-xls over pyexcel-xlsx for parsing xlsx files.

pyexcel-xls works fine for reading xls files, but the underlying (and unmaintained) xlrd library's XML parsing seems to rely on a method which has been removed from ElementTree. I haven't looked to far into this, but I did see this in what's new in Python 3.9:

Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

I tried solving this issue at pyexcel/pyexcel-io#99 with no luck.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions