Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer XLSX plugin when reading XLSX files. #99

Closed
wants to merge 3 commits into from
Closed

Prefer XLSX plugin when reading XLSX files. #99

wants to merge 3 commits into from

Conversation

craiga
Copy link
Contributor

@craiga craiga commented Oct 16, 2020

Reading xlsx files with xlrd appears to be broken in Python 3.9.

    def process_stream(self, stream, heading=None):
        if self.verbosity >= 2 and heading is not None:
            fprintf(self.logfile, "\n=== %s ===\n", heading)
        self.tree = ET.parse(stream)
        getmethod = self.tag2meth.get
>       for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
E       AttributeError: 'ElementTree' object has no attribute 'getiterator'

Reading xls files still works fine, but xlrd's XML parsing seems to rely on a method which has been removed from ElementTree. I haven't looked to far into this, but I did see this in what's new in Python 3.9:

Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

This means that I can't parse xlsx and xls files under Python 3.9, as pyexcel-xls is preferred over pyexcel-xlsx.

This PR should resolve this, but I don't know what the knock-on effects of this might be.

What do you think?

  • Has test cases written?
  • Has all code lines tested?
  • Has make format been run?
  • Please update CHANGELOG.yml(not CHANGELOG.rst)
  • Passes all Travis CI builds
  • Has fair amount of documentation if your change is complex
  • Agree on NEW BSD License for your contribution

@codecov-io
Copy link

codecov-io commented Oct 16, 2020

Codecov Report

Merging #99 (099c977) into dev (7adcec9) will increase coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev      #99      +/-   ##
==========================================
+ Coverage   97.83%   97.91%   +0.07%     
==========================================
  Files          52       52              
  Lines        3332     3360      +28     
==========================================
+ Hits         3260     3290      +30     
+ Misses         72       70       -2     
Impacted Files Coverage Δ
pyexcel_io/utils.py 100.00% <ø> (ø)
pyexcel_io/database/common.py 100.00% <0.00%> (ø)
pyexcel_io/database/importers/django.py 100.00% <0.00%> (ø)
tests/test_django_book.py 99.67% <0.00%> (+0.01%) ⬆️
pyexcel_io/plugins.py 96.39% <0.00%> (+0.90%) ⬆️
pyexcel_io/database/importers/sqlalchemy.py 98.27% <0.00%> (+1.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7adcec9...099c977. Read the comment docs.

@chfw
Copy link
Member

chfw commented Oct 16, 2020

Makes sense.

Please update changelog.yml and tick on new bsd license box.

Thanks for adding 3.9-dev. If you want the change stay, the right place(the strange place) is .moban.d/custom_travis.yml.jj2. Then our tool will render it as .travis.yml

@chfw
Copy link
Member

chfw commented Oct 17, 2020

Just a note: this change does not have material change rather the helper message put xlsx before xls. Inside pyexcel, there is no preference mechanism.

@chfw
Copy link
Member

chfw commented Oct 31, 2020

@craiga , what's your plan for this PR?

if you want to force pyexcel-io to use pyexcel-xlsx, you can pass on a parameter: "library=..".

@craiga
Copy link
Contributor Author

craiga commented Nov 15, 2020

@chfw As you mentioned, I've just verified that this change doesn't fix the issue I'm seeing in my app. I've been trying to replicate the problem in a test inside pyexcel-io without luck. I'm going to close this PR and do some more investigation.

@craiga craiga closed this Nov 15, 2020
@chfw
Copy link
Member

chfw commented Nov 15, 2020

Please use ‘library’ option to force it to use pyexcel-xlsx

@craiga
Copy link
Contributor Author

craiga commented Nov 16, 2020

@chfw That approach works, but unfortunately I can't see a good opportunity to specify a library when the file is uploaded using django_excel. I've logged this as an issue on that project pyexcel-webwares/django-excel#66, and would love to hear your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants