Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: Ignore EOF in NumberObject.read_from_stream #1505

Closed
wants to merge 2 commits into from

Conversation

rraval
Copy link
Contributor

@rraval rraval commented Dec 16, 2022

Use ignore_eof=True just like NameObject does, which is the only other caller to read_until_regex in this module.

This helps prevent arcane exceptions when trying to parse a number:

PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

The motivation is essentially identical to the change that introduced ignore_eof=True on NameObjects:
431ba70

@rraval
Copy link
Contributor Author

rraval commented Dec 16, 2022

Sadly I do not have a reproducing PDF that I can add to the testing corpus. This came up when exercising PdfMerger with a bunch of private files.

@codecov
Copy link

codecov bot commented Dec 16, 2022

Codecov Report

Base: 91.84% // Head: 92.03% // Increases project coverage by +0.19% 🎉

Coverage data is based on head (b1d7aa7) compared to base (e711846).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b1d7aa7 differs from pull request most recent head ec8b349. Consider uploading reports for the commit ec8b349 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1505      +/-   ##
==========================================
+ Coverage   91.84%   92.03%   +0.19%     
==========================================
  Files          33       32       -1     
  Lines        6202     5976     -226     
  Branches     1228     1163      -65     
==========================================
- Hits         5696     5500     -196     
+ Misses        326      312      -14     
+ Partials      180      164      -16     
Impacted Files Coverage Δ
PyPDF2/generic/_base.py 99.64% <100.00%> (ø)
pypdf/_reader.py
pypdf/generic/_fit.py
pypdf/generic/_outline.py
pypdf/types.py
pypdf/_codecs/zapfding.py
pypdf/_version.py
pypdf/_codecs/std.py
pypdf/_codecs/pdfdoc.py
pypdf/_codecs/__init__.py
... and 56 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rraval rraval changed the title Ignore EOF in NumberObject.read_from_stream ROB: Ignore EOF in NumberObject.read_from_stream Dec 16, 2022
Use `ignore_eof=True` just like `NameObject` does, which is the only other
caller to `read_until_regex` in this module.

This helps prevent arcane exceptions when trying to parse a number:

```
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly
```

The motivation is essentially identical to the change that introduced
`ignore_eof=True` on `NameObject`s:
py-pdf@431ba70
@rraval
Copy link
Contributor Author

rraval commented Dec 29, 2022

Hi there, I've rebased the branch off the latest main to permit clean fast-forward merging. Hopefully this can get upstreamed soon.

Copy link
Member

@MasterOdin MasterOdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now all usages of read_until_regex override the default ignore_eof value of False, right? If we're going this route, should probably then flip the default value to True and then remove using the third argument in the two places...

rraval added a commit to rraval/PyPDF2 that referenced this pull request Dec 29, 2022
This was initially motivated by `NumberObject.read_from_stream`, which
was calling `read_until_regex` with the default value of
`ignore_eof=False` and thus raising exceptions like:

```
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly
```

py-pdf@431ba70
demonstrates a similar fix for `NameObject.read_from_stream`.

From discussion in py-pdf#1505, it was
realized that the change to `NumberObject.read_from_stream` had now made
ALL callers of `read_until_regex` pass `ignore_eof=True`. It's cleaner
to remove the parameter entirely and change the default behaviour.
@rraval
Copy link
Contributor Author

rraval commented Dec 29, 2022

So now all usages of read_until_regex override the default ignore_eof value of False, right? If we're going this route, should probably then flip the default value to True and then remove using the third argument in the two places...

I've opened a separate PR along these lines: #1521

Feel free to pick one and close out the other. I'm not familiar enough with this codebase to judge which is "better".

rraval added a commit to rraval/PyPDF2 that referenced this pull request Jan 9, 2023
This was initially motivated by `NumberObject.read_from_stream`, which
was calling `read_until_regex` with the default value of
`ignore_eof=False` and thus raising exceptions like:

```
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly
```

py-pdf@431ba70
demonstrates a similar fix for `NameObject.read_from_stream`.

From discussion in py-pdf#1505, it was
realized that the change to `NumberObject.read_from_stream` had now made
ALL callers of `read_until_regex` pass `ignore_eof=True`. It's cleaner
to remove the parameter entirely and change the default behaviour.
@MartinThoma
Copy link
Member

MartinThoma commented Jan 21, 2023

I'm closing this one as the parameter was now removed with #1521 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants