Type check for obj in PDFPageInterpreter #441

tongbaojia · 2020-06-09T14:06:12Z

Bug report

python type errors are raised when the object is not of type PDFStream.
The obj type can be byte. There is no type check when using the functional call in.
https://github.com/pdfminer/pdfminer.six/blob/develop/pdfminer/pdfinterp.py#L840
The pdf file is attached as sample.pdf
python pdf2txt.py "sample.pdf" --pagenos "1"
stacktrace.

Traceback (most recent call last):
  File "pdf2txt.py", line 188, in <module>
    sys.exit(main())
  File "pdf2txt.py", line 182, in main
    outfp = extract_text(**vars(A))
  File "pdf2txt.py", line 56, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "***/lib/python3.6/site-packages/pdfminer/high_level.py", line 86, in extract_text_to_fp
    interpreter.process_page(page)
  File "***/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 895, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "***/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 908, in render_contents
    self.execute(list_value(streams))
  File "***/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 933, in execute
    func(*args)
  File "***/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 840, in do_EI
    if 'W' in obj and 'H' in obj:
TypeError: a bytes-like object is required, not 'str'

The text was updated successfully, but these errors were encountered:

tongbaojia · 2020-06-26T16:41:52Z

sample.pdf

pietermarsman · 2020-06-29T18:44:13Z

I can replicate this issue using the newest pdfminer.six.

pietermarsman added the type: bug label Jun 29, 2020

tongbaojia mentioned this issue Jul 1, 2020

Check the obj type before calling the in operation #451

Merged

6 tasks

pietermarsman closed this as completed in #451 Jul 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type check for obj in PDFPageInterpreter #441

Type check for obj in PDFPageInterpreter #441

tongbaojia commented Jun 9, 2020 •

edited

Loading

tongbaojia commented Jun 26, 2020

pietermarsman commented Jun 29, 2020

Type check for obj in PDFPageInterpreter #441

Type check for obj in PDFPageInterpreter #441

Comments

tongbaojia commented Jun 9, 2020 • edited Loading

tongbaojia commented Jun 26, 2020

pietermarsman commented Jun 29, 2020

tongbaojia commented Jun 9, 2020 •

edited

Loading