Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextIOWrapper mistakenly classified as binary #615

Closed
htInEdin opened this issue May 5, 2021 · 0 comments · Fixed by #616
Closed

TextIOWrapper mistakenly classified as binary #615

htInEdin opened this issue May 5, 2021 · 0 comments · Fixed by #616

Comments

@htInEdin
Copy link

htInEdin commented May 5, 2021

Bug report

Bug is in converter.PDFConverter._is_binary_stream

I have only just caught up with pdfminer.six, was using pdfminer.20191103
This allowed me to use a TextIOWrapper as the output stream for a TextConverter, but this fails with pdfminer.six because _is_binary_stream fails to recognise TextIOWrapper as non-binary.

  • Steps to reproduce the bug.
        text_io = TextIOWrapper(BytesIO())
        rsrcmgr = PDFResourceManager(caching=True)
        converter = TextConverter(rsrcmgr, text_io,
                                  laparams=LAParams(), imagewriter=None)
        interpreter = PDFPageInterpreter(rsrcmgr, converter)
        ...
       for page in PDFPage.get_pages(...):
            # Read page contents
            interpreter.process_page(page)
  • If relevant, include the output and/or error stacktrace.
  ...
    render(ltpage)
  ...
  File "/usr/lib/python3.6/site-packages/pdfminer/converter.py", line 211, in render
    self.write_text(item.get_text())
  File "/usr/lib/python3.6/site-packages/pdfminer/converter.py", line 202, in write_text
    self.outfp.write(text)
TypeError: write() argument must be str, not bytes

The fix is trivial, see pull request shortly...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant