TextIOWrapper mistakenly classified as binary #615

htInEdin · 2021-05-05T16:33:48Z

Bug report

Bug is in converter.PDFConverter._is_binary_stream

I have only just caught up with pdfminer.six, was using pdfminer.20191103
This allowed me to use a TextIOWrapper as the output stream for a TextConverter, but this fails with pdfminer.six because _is_binary_stream fails to recognise TextIOWrapper as non-binary.

Steps to reproduce the bug.

        text_io = TextIOWrapper(BytesIO())
        rsrcmgr = PDFResourceManager(caching=True)
        converter = TextConverter(rsrcmgr, text_io,
                                  laparams=LAParams(), imagewriter=None)
        interpreter = PDFPageInterpreter(rsrcmgr, converter)
        ...
       for page in PDFPage.get_pages(...):
            # Read page contents
            interpreter.process_page(page)

If relevant, include the output and/or error stacktrace.

  ...
    render(ltpage)
  ...
  File "/usr/lib/python3.6/site-packages/pdfminer/converter.py", line 211, in render
    self.write_text(item.get_text())
  File "/usr/lib/python3.6/site-packages/pdfminer/converter.py", line 202, in write_text
    self.outfp.write(text)
TypeError: write() argument must be str, not bytes

The fix is trivial, see pull request shortly...

The text was updated successfully, but these errors were encountered:

htInEdin mentioned this issue May 5, 2021

Bug: _is_binary_stream should recognize TextIOWrapper as non-binary, escaped \r\n should be removed #616

Merged

6 tasks

pietermarsman closed this as completed in #616 Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextIOWrapper mistakenly classified as binary #615

TextIOWrapper mistakenly classified as binary #615

htInEdin commented May 5, 2021

TextIOWrapper mistakenly classified as binary #615

TextIOWrapper mistakenly classified as binary #615

Comments

htInEdin commented May 5, 2021