pdf set to None followed by an attribute check on pdf #1107

jlshin · 2022-07-14T16:06:01Z

Environment

Python 3.8.13 with PyPDF2==2.5.0

reader = PyPDF2.PdfFileReader(file)
number_of_pages = reader.numPages
for page_number in range(0, number_of_pages):
    page = reader.getPage(page_number)
    page_content = page.extractText()

Traceback

    page_content = page.extractText()
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/_page.py", line 1340, in extractText
    return self.extract_text(Tj_sep=Tj_sep, TJ_sep=TJ_sep)
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/_page.py", line 1317, in extract_text
    return self._extract_text(self, self.pdf, space_width, PG.CONTENTS)
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/_page.py", line 1139, in _extract_text
    content = ContentStream(content, pdf, "bytes")
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 1196, in __init__
    self.__parse_content_stream(stream_bytes)
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 1226, in __parse_content_stream
    operands.append(read_object(stream, None, self.forced_encoding))
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 1329, in read_object
    return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 808, in read_from_stream
    if pdf.strict:
AttributeError: 'NoneType' object has no attribute 'strict'

I cannot attach the PDF I am using, but I can explain what I think the bug is:

In generic::_parse_content_stream I hit the following condition, which sets the pdf to None:
https://github.com/py-pdf/PyPDF2/blob/1e4c2c9b4649449241b0ae166e7e90f6bc61596d/PyPDF2/generic.py#L1226

So by the time we get to:
https://github.com/py-pdf/PyPDF2/blob/1e4c2c9b4649449241b0ae166e7e90f6bc61596d/PyPDF2/generic.py#L808-L811

The above error is raised because pdf is None and has no attribute strict

I have gotten around it by modifying line 808 to

if pdf is not None and pdf.strict:

The text was updated successfully, but these errors were encountered:

MartinThoma · 2022-07-14T20:39:41Z

As a quick fix this makes sense. I'm not sure how to make a clean solution.

Do you want to open a pr with your fix?

Closes py-pdf#1107

Guard pdf.strict with check if pdf is None in DictionaryObject.read_from_stream Closes #1107

Guard pdf.strict with check if pdf is None in DictionaryObject.read_from_stream Closes py-pdf#1107

jlshin changed the title ~~pdf set to None following by an attribute check on pdf~~ pdf set to None followed by an attribute check on pdf Jul 14, 2022

jlshin added a commit to jlshin/PyPDF2 that referenced this issue Jul 14, 2022

Check if pdf variable is None

3bff9e5

Closes py-pdf#1107

jlshin mentioned this issue Jul 14, 2022

Check if pdf variable is None #1113

Merged

MartinThoma closed this as completed in #1113 Jul 15, 2022

MartinThoma pushed a commit that referenced this issue Jul 15, 2022

BUG: None-check in DictionaryObject.read_from_stream (#1113)

9bbe827

Guard pdf.strict with check if pdf is None in DictionaryObject.read_from_stream Closes #1107

mtd91429 pushed a commit to mtd91429/PyPDF2 that referenced this issue Jul 15, 2022

BUG: None-check in DictionaryObject.read_from_stream (py-pdf#1113)

d6c3100

Guard pdf.strict with check if pdf is None in DictionaryObject.read_from_stream Closes py-pdf#1107

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf set to None followed by an attribute check on pdf #1107

pdf set to None followed by an attribute check on pdf #1107

jlshin commented Jul 14, 2022

MartinThoma commented Jul 14, 2022

pdf set to None followed by an attribute check on pdf #1107

pdf set to None followed by an attribute check on pdf #1107

Comments

jlshin commented Jul 14, 2022

Environment

Traceback

MartinThoma commented Jul 14, 2022