-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while reading bookmarks/outlines "TypeError: argument of type 'NoneType' is not iterable" #1059
Comments
Thank you for reporting the issue ❤️
|
The PDF is non standard-compliant. You can see a warning
Via https://demo.verapdf.org/ you can see several issues... I'm not certain, though, if they are connected to the problem you face. I think the xref table might just be wrong. I don't know how atril can recover the outlines from it. |
Yes, the problem in this file is the xref objects. The way PyPDF2 reads pdfs is it essentially searches for the xref table and parses it. It then uses additional dictionaries within the file in conjunction with the xref table to locate the various objects at their byte location. For the outline in this example, PyPDF2 processes the /Trailer which points to the document /Root. Root points to the Outline dictionary at object 86 (id number) 0 (generation number). This object is missing. This object (the Outline Dictionary) is supposed to point to the First and Last children (outline items) and is used as the starting point to build the outline tree. The Outline Dictionary exists within the document, just at a different location (i.e., not at 86 0 R). Fixing such an issue is possible with some commercially available PDF software renderers, such as Adobe Acrobat or PDF XChange. However, from what I can tell, fixing such an issue is currently beyond PyPDF2's "plug-n-play" capabilities. I think it could be done with some one-off code specifically for this situation. However, it is probably easiest to simply open and re-save the document in Adobe Acrobat. For the code base, we could consider adding some logic to the PdfReader code within |
Adjust `PdfReader._build_outline(...)` and `PdfReader._build_destination(...)` to handle outline items with and without valid destinations Closes #193 : PdfReadError: Unexpected destination '/__WKANCHOR_2' Closes #956 : ValueError: Unresolved bookmark #1059 no longer throws an exception, but the outlines are not extracted either. Closes #1068 : Skip NameObject when building outline
Retested with Latest dev version (2.10.4+ / 5?) in progress @MartinThoma, this issue should be closed |
+1? |
Thank you! |
"TypeError: argument of type 'NoneType' is not iterable"
Got this when I tried to read the outlines of a PDF file with
PdfReader.outlines
.Environment
Which environment were you using when you encountered the problem?
$ python -m platform Windows-10-10.0.19044-SP0 $ python -c "import PyPDF2;print(PyPDF2.__version__)" 2.4.1
Code + PDF
Example PDF file: sample.pdf (Yes, you can use this file for tests)
Traceback
This is the complete Traceback I see:
The text was updated successfully, but these errors were encountered: