ROB: Ignore EOF in NumberObject.read_from_stream #1505

rraval · 2022-12-16T04:42:30Z

Use ignore_eof=True just like NameObject does, which is the only other caller to read_until_regex in this module.

This helps prevent arcane exceptions when trying to parse a number:

PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

The motivation is essentially identical to the change that introduced ignore_eof=True on NameObjects:
431ba70

rraval · 2022-12-16T04:43:18Z

Sadly I do not have a reproducing PDF that I can add to the testing corpus. This came up when exercising PdfMerger with a bunch of private files.

codecov · 2022-12-16T04:51:30Z

Codecov Report

Base: 91.84% // Head: 92.03% // Increases project coverage by +0.19% 🎉

Coverage data is based on head (b1d7aa7) compared to base (e711846).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b1d7aa7 differs from pull request most recent head ec8b349. Consider uploading reports for the commit ec8b349 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1505      +/-   ##
==========================================
+ Coverage   91.84%   92.03%   +0.19%     
==========================================
  Files          33       32       -1     
  Lines        6202     5976     -226     
  Branches     1228     1163      -65     
==========================================
- Hits         5696     5500     -196     
+ Misses        326      312      -14     
+ Partials      180      164      -16

Impacted Files	Coverage Δ
PyPDF2/generic/_base.py	`99.64% <100.00%> (ø)`
pypdf/_reader.py
pypdf/generic/_fit.py
pypdf/generic/_outline.py
pypdf/types.py
pypdf/_codecs/zapfding.py
pypdf/_version.py
pypdf/_codecs/std.py
pypdf/_codecs/pdfdoc.py
pypdf/_codecs/__init__.py
... and 56 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Use `ignore_eof=True` just like `NameObject` does, which is the only other caller to `read_until_regex` in this module. This helps prevent arcane exceptions when trying to parse a number: ``` PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly ``` The motivation is essentially identical to the change that introduced `ignore_eof=True` on `NameObject`s: py-pdf@431ba70

rraval · 2022-12-29T02:21:41Z

Hi there, I've rebased the branch off the latest main to permit clean fast-forward merging. Hopefully this can get upstreamed soon.

MasterOdin

So now all usages of read_until_regex override the default ignore_eof value of False, right? If we're going this route, should probably then flip the default value to True and then remove using the third argument in the two places...

This was initially motivated by `NumberObject.read_from_stream`, which was calling `read_until_regex` with the default value of `ignore_eof=False` and thus raising exceptions like: ``` PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly ``` py-pdf@431ba70 demonstrates a similar fix for `NameObject.read_from_stream`. From discussion in py-pdf#1505, it was realized that the change to `NumberObject.read_from_stream` had now made ALL callers of `read_until_regex` pass `ignore_eof=True`. It's cleaner to remove the parameter entirely and change the default behaviour.

rraval · 2022-12-29T16:28:02Z

So now all usages of read_until_regex override the default ignore_eof value of False, right? If we're going this route, should probably then flip the default value to True and then remove using the third argument in the two places...

I've opened a separate PR along these lines: #1521

Feel free to pick one and close out the other. I'm not familiar enough with this codebase to judge which is "better".

This was initially motivated by `NumberObject.read_from_stream`, which was calling `read_until_regex` with the default value of `ignore_eof=False` and thus raising exceptions like: ``` PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly ``` py-pdf@431ba70 demonstrates a similar fix for `NameObject.read_from_stream`. From discussion in py-pdf#1505, it was realized that the change to `NumberObject.read_from_stream` had now made ALL callers of `read_until_regex` pass `ignore_eof=True`. It's cleaner to remove the parameter entirely and change the default behaviour.

MartinThoma · 2023-01-21T12:28:13Z

I'm closing this one as the parameter was now removed with #1521 :-)

rraval force-pushed the number-object-ignore-eof branch from 7ec8040 to b4cb2fa Compare December 16, 2022 05:59

rraval changed the title ~~Ignore EOF in NumberObject.read_from_stream~~ ROB: Ignore EOF in NumberObject.read_from_stream Dec 16, 2022

rraval force-pushed the number-object-ignore-eof branch from b1d7aa7 to c93f755 Compare December 29, 2022 02:19

MasterOdin reviewed Dec 29, 2022

View reviewed changes

rraval mentioned this pull request Dec 29, 2022

ROB: ignore_eof everywhere for read_until_regex #1521

Merged

Merge branch 'main' into number-object-ignore-eof

ec8b349

MartinThoma closed this Jan 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROB: Ignore EOF in NumberObject.read_from_stream #1505

ROB: Ignore EOF in NumberObject.read_from_stream #1505

rraval commented Dec 16, 2022

rraval commented Dec 16, 2022

codecov bot commented Dec 16, 2022 •

edited

Loading

rraval commented Dec 29, 2022

MasterOdin left a comment

rraval commented Dec 29, 2022

MartinThoma commented Jan 21, 2023 •

edited

Loading

ROB: Ignore EOF in NumberObject.read_from_stream #1505

ROB: Ignore EOF in NumberObject.read_from_stream #1505

Conversation

rraval commented Dec 16, 2022

rraval commented Dec 16, 2022

codecov bot commented Dec 16, 2022 • edited Loading

Codecov Report

rraval commented Dec 29, 2022

MasterOdin left a comment

Choose a reason for hiding this comment

rraval commented Dec 29, 2022

MartinThoma commented Jan 21, 2023 • edited Loading

codecov bot commented Dec 16, 2022 •

edited

Loading

MartinThoma commented Jan 21, 2023 •

edited

Loading