BUG: catch the case where w[0] is an IndirectObject instead of an int #2154

rchen19 · 2023-09-05T21:49:45Z

pubpub-zz · 2023-09-05T22:01:22Z

👍
can you please add also a test, using the file you are reference and just extracting the text in page[0] (no need to assert any results) : that will prevent the issue to come back

rchen19 · 2023-09-05T22:03:42Z

👍 can you please add also a test, using the file you are reference and just extracting the text in page[0] (no need to assert any results) : that will prevent the issue to come back

Will do, would you want it inside an existing test file, or a separate one?

And I assume the file goes to resources/, right?

codecov · 2023-09-05T22:04:53Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (0ca4d37) 94.34% compared to head (dc0c8b5) 94.34%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2154   +/-   ##
=======================================
  Coverage   94.34%   94.34%           
=======================================
  Files          43       43           
  Lines        7572     7572           
  Branches     1488     1488           
=======================================
  Hits         7144     7144           
  Misses        263      263           
  Partials      165      165

Files Changed	Coverage Δ
pypdf/_cmap.py	`93.68% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pubpub-zz · 2023-09-06T05:12:06Z

It sounds good to add it to test_cmap.py

stefan6419846 · 2023-09-06T18:54:03Z

And I assume the file goes to resources/, right?

You will find some examples of files being downloaded on-the-fly inside the test code, see

pypdf/tests/test_writer.py

Lines 1040 to 1044 in 05f2a65

 @pytest.mark.enable_socket() 

 def test_iss471(): 

 url = "https://github.com/py-pdf/pypdf/files/9139245/book.pdf" 

 name = "book_471.pdf" 

 reader = PdfReader(BytesIO(get_data_from_url(url, name=name)))

for example. This especially holds true if you do not hold all required rights on the test files itself and cannot comply with the terms in https://github.com/py-pdf/sample-files#licenses.

rchen19 · 2023-09-06T21:45:30Z

And I assume the file goes to resources/, right?

You will find some examples of files being downloaded on-the-fly inside the test code, see

pypdf/tests/test_writer.py

Lines 1040 to 1044 in 05f2a65

@pytest.mark.enable_socket()

def test_iss471():

url = "https://github.com/py-pdf/pypdf/files/9139245/book.pdf"

name = "book_471.pdf"

reader = PdfReader(BytesIO(get_data_from_url(url, name=name)))

for example. This especially holds true if you do not hold all required rights on the test files itself and cannot comply with the terms in https://github.com/py-pdf/sample-files#licenses.

The file I need is from arxiv, should be pretty easy to get a direct link. Thanks for the pointers.

pubpub-zz · 2023-09-06T21:48:13Z

you should use https://github.com/py-pdf/pypdf/files/12489914/Morris.et.al.-.2020.-.TextAttack.A.Framework.for.Adversarial.Attacks.Data.Augmentation.and.Adversarial.Training.in.NLP.pdf which is already within discussion

- a pdf file from arxiv is included

- URL too long - file name too long - variable declared but not used

rchen19 · 2023-09-06T23:39:21Z

you should use https://github.com/py-pdf/pypdf/files/12489914/Morris.et.al.-.2020.-.TextAttack.A.Framework.for.Adversarial.Attacks.Data.Augmentation.and.Adversarial.Training.in.NLP.pdf which is already within discussion

The URL is too long to pass the code style check, I used the file directly from arxiv instead. But if file within github is preferred, I can shorten the file name and upload again.

…e-interpreted-as-an-integer

MartinThoma · 2023-09-10T13:24:26Z

Well done, @rraval !

I've just merged it and will release soon. If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html

## What's new ### Security (SEC) - Infinite recursion caused by IndirectObject clone (#2156) ### New Features (ENH) - Ease access to ViewerPreferences (#2144) ### Bug Fixes (BUG) - catch the case where w[0] is an IndirectObject instead of an int (#2154) - Cope with indirect objects in filters and remove deprecated code (#2177) - Cope with extra space (#2151) - Merge pages without resources (#2150) - getcontents() shall return None if contents is NullObject (#2161) - Fix conversion from 1 to LA (#2175) - Accept tabs in cmaps (#2174) ### Robustness (ROB) - Accept XYZ with no arguments (#2178) [Full Changelog](3.15.5...3.16.0)

rchen19 · 2023-09-11T21:26:25Z

Well done, @rraval !

I've just merged it and will release soon. If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html

Sure, that sounds nice. Thanks.

BUG: catch the case where w[0] is an IndirectObject instead of an int

295728b

rchen19 added 2 commits September 6, 2023 14:53

BUG: add test for bug fix for issue py-pdf#2137

f92244e

- a pdf file from arxiv is included

BUG: fix code stype errors in test for issue py-pdf#2137

6d9f1fe

- URL too long - file name too long - variable declared but not used

Merge branch 'main' into BUG-TypeError-IndirectObject-object-cannot-b…

dc0c8b5

…e-interpreted-as-an-integer

MartinThoma merged commit 4657df5 into py-pdf:main Sep 10, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: catch the case where w[0] is an IndirectObject instead of an int #2154

BUG: catch the case where w[0] is an IndirectObject instead of an int #2154

rchen19 commented Sep 5, 2023

pubpub-zz commented Sep 5, 2023

rchen19 commented Sep 5, 2023

codecov bot commented Sep 5, 2023 •

edited

Loading

pubpub-zz commented Sep 6, 2023

stefan6419846 commented Sep 6, 2023

rchen19 commented Sep 6, 2023

pubpub-zz commented Sep 6, 2023

rchen19 commented Sep 6, 2023

MartinThoma commented Sep 10, 2023

rchen19 commented Sep 11, 2023

BUG: catch the case where w[0] is an IndirectObject instead of an int #2154

BUG: catch the case where w[0] is an IndirectObject instead of an int #2154

Conversation

rchen19 commented Sep 5, 2023

pubpub-zz commented Sep 5, 2023

rchen19 commented Sep 5, 2023

codecov bot commented Sep 5, 2023 • edited Loading

Codecov Report

pubpub-zz commented Sep 6, 2023

stefan6419846 commented Sep 6, 2023

rchen19 commented Sep 6, 2023

pubpub-zz commented Sep 6, 2023

rchen19 commented Sep 6, 2023

MartinThoma commented Sep 10, 2023

rchen19 commented Sep 11, 2023

codecov bot commented Sep 5, 2023 •

edited

Loading