Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: Handle missing /Type entry in Page tree #1845

Merged
merged 4 commits into from
May 20, 2023
Merged

Conversation

pubpub-zz
Copy link
Collaborator

@pubpub-zz pubpub-zz commented May 18, 2023

/Type is mandatory in page tree nodes according to the the PDF specification. Hence dealing with such files is a robustness improvements.
Acrobat Reader can open such PDF documents as well.

Fixes #500

criteria to consider it as a page : /Kids missing

pubpub-zz added 3 commits May 18, 2023 10:03
Fixes py-pdf#500
these PDF are not compliant with Standard (Type is mandatory in page tree nodes however Acrobat Reader can open them
Robustness improved
/Contents is optional
/Kids not being present is more reliable
@codecov
Copy link

codecov bot commented May 18, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (dde4c79) 93.40% compared to head (7a4bcc6) 93.40%.

❗ Current head 7a4bcc6 differs from pull request most recent head 48bc503. Consider uploading reports for the commit 48bc503 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1845   +/-   ##
=======================================
  Coverage   93.40%   93.40%           
=======================================
  Files          34       34           
  Lines        6627     6629    +2     
  Branches     1299     1300    +1     
=======================================
+ Hits         6190     6192    +2     
  Misses        285      285           
  Partials      152      152           
Impacted Files Coverage Δ
pypdf/_reader.py 91.27% <100.00%> (+0.01%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pubpub-zz
Copy link
Collaborator Author

@MartinThoma
all yours 🙂

@MartinThoma MartinThoma changed the title Iss500 ROB: Fixes #500 May 18, 2023
@MartinThoma MartinThoma changed the title ROB: Fixes #500 ROB: Handle missing /Kids entry in pages May 18, 2023
@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label May 18, 2023
@MartinThoma MartinThoma changed the title ROB: Handle missing /Kids entry in pages ROB: Handle missing /Type entry in pages May 18, 2023
@MartinThoma MartinThoma changed the title ROB: Handle missing /Type entry in pages ROB: Handle missing /Type entry in Pages tree May 18, 2023
@MartinThoma MartinThoma changed the title ROB: Handle missing /Type entry in Pages tree ROB: Handle missing /Type entry in Page tree May 18, 2023
pypdf/_reader.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

I've just got a stylistic question where I would like to hear your opinion. Besides that, I'm fine with the PR :-)

Thank you 🤗

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label May 18, 2023
@MartinThoma MartinThoma merged commit 29e7eb9 into py-pdf:main May 20, 2023
MartinThoma added a commit that referenced this pull request May 21, 2023
New Features (ENH)
-  Simplify metadata input (Document Information Dictionary) (#1851)
-  Extend cmap compatibilty to GBK_EUC_H/V (#1812)

Bug Fixes (BUG)
-  Prevent infinite loop when no character follows after a comment (#1828)
-  get_contents does not return ContentStream (#1847)
-  Accept XYZ destination with zoom missing (default to zoom=0.0) (#1844)
-  Cope with 1 Bit images (#1815)

Robustness (ROB)
-  Handle missing /Type entry in Page tree (#1845)

Documentation (DOC)
-  Expand file size explanations (#1835)
-  Add comparison with pdfplumber (#1837)
-  Clarify that PyPDF2 is dead (#1827)
-  Add Hunter King as Contributor for #1806

Maintenance (MAINT)
-  Refactor internal Encryption class (#1821)
-  Add R parameter to generate_values (#1820)
-  Make encryption_key parameter of write_to_stream optional (#1819)
-  Prepare for adding AES enryption support (#1818)

Code Style (STY):
-  Iterate directly over the list instead of using range (#1839)
-  Minor refactorings in _encryption.py (#1822)

[Full Changelog](3.8.1...3.9.0)
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this pull request May 23, 2023
Forgot that the code was duplicated
MartinThoma pushed a commit that referenced this pull request Jun 3, 2023
@pubpub-zz pubpub-zz deleted the iss500 branch September 2, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-robustness-issue From a users perspective, this is about robustness soon PRs that are almost ready to be merged, issues that get solved pretty soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Getting KeyError: '/Kids' for getNumPages
2 participants