Error while reading bookmarks/outlines "TypeError: argument of type 'NoneType' is not iterable" #1059

hassanseoul123 · 2022-07-05T03:00:58Z

"TypeError: argument of type 'NoneType' is not iterable"
Got this when I tried to read the outlines of a PDF file with PdfReader.outlines.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-10.0.19044-SP0

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.4.1

Code + PDF

Example PDF file: sample.pdf (Yes, you can use this file for tests)

from PyPDF2 import PdfReader
reader = PdfReader("sample.pdf")
print(reader.outlines)

Traceback

This is the complete Traceback I see:

C:\Users\Hassan\AppData\Local\Programs\Python\Python310\lib\site-packages\PyPDF2\_reader.py:1089: PdfReadWarning: Object 86 0 not defined.
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\Hassan\Desktop\main.py", line 3, in <module>
    outlines = reader.outlines
  File "C:\Users\Hassan\AppData\Local\Programs\Python\Python310\lib\site-packages\PyPDF2\_reader.py", line 674, in outlines
    return self._get_outlines()
  File "C:\Users\Hassan\AppData\Local\Programs\Python\Python310\lib\site-packages\PyPDF2\_reader.py", line 694, in _get_outlines
    if "/First" in lines:
TypeError: argument of type 'NoneType' is not iterable

The text was updated successfully, but these errors were encountered:

MartinThoma · 2022-07-05T05:49:14Z

Thank you for reporting the issue ❤️

PdfReader.outlines is the one you should use. The others do the same thing, but they are deprecated (see CHANGELOG)

See #1059

MartinThoma · 2022-07-05T08:29:26Z

The PDF is non standard-compliant. You can see a warning

PdfReadWarning: Object 86 0 not defined

Via https://demo.verapdf.org/ you can see several issues... I'm not certain, though, if they are connected to the problem you face. I think the xref table might just be wrong. I don't know how atril can recover the outlines from it.

See #1059

mtd91429 · 2022-07-19T19:51:36Z

Yes, the problem in this file is the xref objects.

The way PyPDF2 reads pdfs is it essentially searches for the xref table and parses it. It then uses additional dictionaries within the file in conjunction with the xref table to locate the various objects at their byte location.

For the outline in this example, PyPDF2 processes the /Trailer which points to the document /Root. Root points to the Outline dictionary at object 86 (id number) 0 (generation number). This object is missing. This object (the Outline Dictionary) is supposed to point to the First and Last children (outline items) and is used as the starting point to build the outline tree. The Outline Dictionary exists within the document, just at a different location (i.e., not at 86 0 R). Fixing such an issue is possible with some commercially available PDF software renderers, such as Adobe Acrobat or PDF XChange. However, from what I can tell, fixing such an issue is currently beyond PyPDF2's "plug-n-play" capabilities. I think it could be done with some one-off code specifically for this situation. However, it is probably easiest to simply open and re-save the document in Adobe Acrobat.

For the code base, we could consider adding some logic to the PdfReader code within _get_outlines() method such that if the /Catalog contains a reference to the /Outlines dictionary, but the reference is missing from the xref table, to manually parse the document's objects and attempt to infer it from the attributes defined in Table 152 of PDFv1.7 specification, then update the /Catalog/Outlines pointer value. That would probably be best implemented as part of a larger framework to handle misplaced and/or unreferenced objects rather than a one-off endeavor for this particular niche-bug.

MartinThoma · 2022-07-23T06:18:43Z

Outlines chromes can extract:

Adjust `PdfReader._build_outline(...)` and `PdfReader._build_destination(...)` to handle outline items with and without valid destinations Closes #193 : PdfReadError: Unexpected destination '/__WKANCHOR_2' Closes #956 : ValueError: Unresolved bookmark #1059 no longer throws an exception, but the outlines are not extracted either. Closes #1068 : Skip NameObject when building outline

pubpub-zz · 2022-09-04T10:10:56Z

Retested with Latest dev version (2.10.4+ / 5?) in progress
Same results as Chrome can be observed.
The objects 86 and 88 can be retrieved successfully.

@MartinThoma, this issue should be closed

pubpub-zz · 2022-09-06T20:05:29Z

+1?

MartinThoma · 2022-09-07T16:32:01Z

Thank you!

MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jul 5, 2022

MartinThoma added a commit that referenced this issue Jul 5, 2022

ROB: Guard against None-value in _get_outlines

74aa65e

See #1059

MartinThoma mentioned this issue Jul 5, 2022

ROB: Guard against None-value in _get_outlines #1060

Merged

MartinThoma added a commit that referenced this issue Jul 5, 2022

ROB: Guard against None-value in _get_outlines

6fd0b6d

See #1059

MartinThoma added is-robustness-issue From a users perspective, this is about robustness and removed is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Jul 5, 2022

MartinThoma added a commit that referenced this issue Jul 9, 2022

ROB: Guard against None-value in _get_outlines (#1060)

439c749

See #1059

MartinThoma mentioned this issue Jul 23, 2022

ROB: Handle outlines without destination #1076

Merged

MartinThoma closed this as completed Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while reading bookmarks/outlines "TypeError: argument of type 'NoneType' is not iterable" #1059

Error while reading bookmarks/outlines "TypeError: argument of type 'NoneType' is not iterable" #1059

hassanseoul123 commented Jul 5, 2022 •

edited by MartinThoma

Loading

MartinThoma commented Jul 5, 2022

MartinThoma commented Jul 5, 2022

mtd91429 commented Jul 19, 2022

MartinThoma commented Jul 23, 2022

pubpub-zz commented Sep 4, 2022

pubpub-zz commented Sep 6, 2022

MartinThoma commented Sep 7, 2022

Error while reading bookmarks/outlines "TypeError: argument of type 'NoneType' is not iterable" #1059

Error while reading bookmarks/outlines "TypeError: argument of type 'NoneType' is not iterable" #1059

Comments

hassanseoul123 commented Jul 5, 2022 • edited by MartinThoma Loading

Environment

Code + PDF

Traceback

MartinThoma commented Jul 5, 2022

MartinThoma commented Jul 5, 2022

mtd91429 commented Jul 19, 2022

MartinThoma commented Jul 23, 2022

pubpub-zz commented Sep 4, 2022

pubpub-zz commented Sep 6, 2022

MartinThoma commented Sep 7, 2022

hassanseoul123 commented Jul 5, 2022 •

edited by MartinThoma

Loading