Prevent endless loops during Page->getText() because of faulty recursionStack #457

Nickmanbear · 2021-09-03T19:26:10Z

The $recursionStack used in the getText() function of PDFObject to prevent "Do" commands from causing endless loops removes items from it's stack prematurely during the parsing of pages. Meaning endless loops can and are still being caused by the Do command.

Only reseting the recursionStack at the end of the getText() function for every page prevents this bug.

amooij · 2021-09-06T07:07:09Z

This great, @Nickmanbear. This solves headaches for many people, great work!

Nickmanbear · 2021-09-06T07:26:21Z

@amooij Thanks!

k00ni · 2021-09-06T07:55:19Z

@Nickmanbear thank you for your pull request!

Is there a way to "demonstrate" your change? Like a test which triggers an endless loop?

Nickmanbear · 2021-09-06T09:37:33Z

@k00ni These are a few of the files we have that cause endless loops. The first two will get stuck when you call getText() on the first page and the third one when you call getText() on the second page.

20210514101853.pdf
2021081795357.pdf
202108201456185.pdf

With my fix these files are all able to be parsed just fine. I could add one of these files to the Page integration tests to prove it now works and make sure it continues to work in the future. But I'm not sure how I would write a meaningful test to trigger this endless loop when this fix should prevent it from occurring.

k00ni · 2021-09-06T10:15:27Z

But I'm not sure how I would write a meaningful test to trigger this endless loop when this fix should prevent it from occurring.

Its a good start to have such a test, so one can deactivate the fix and see if such a loop gets triggered.

tests/Integration/PageTest.php

src/Smalot/PdfParser/Page.php

Co-authored-by: Konrad Abicht <hi@inspirito.de>

src/Smalot/PdfParser/Page.php

Co-authored-by: Konrad Abicht <hi@inspirito.de>

tests/Integration/PageTest.php

Nickmanbear · 2021-09-10T07:08:27Z

@k00ni Are there any more changes you think I should make?

k00ni · 2021-09-10T08:05:47Z

I plan to get back to you next week. For now all I can say is that it is looking good.

k00ni · 2021-09-20T14:20:37Z

I am still super busy and try to find some time to finalize my review. Hope its not urgent to merge it.

@j0k3r @smalot feel free to take over and get this one going.

Nickmanbear · 2021-09-22T07:09:43Z

@k00ni We have switched to using this fork for our production environment. So merging this is not that urgent to us. Of course we would still like it to be in the next release, so we can switch back to that.

src/Smalot/PdfParser/Page.php

Co-authored-by: Jérémy Benoist <j0k3r@users.noreply.github.com>

k00ni

Looks good to me. Thanks for you work!

…ionStack (smalot#457) * Reset recursionStack when getting text * Don't pop the recursionStack during text parse * Move recursionStack cleanup to after page getText() * Missing semicolon * Add file containing containing loop for testing * Add test for issue fixed by Pull Request 457 * Update tests/Integration/PageTest.php Co-authored-by: Konrad Abicht <hi@inspirito.de> * Add comment to recursionStack fix * Update src/Smalot/PdfParser/Page.php Co-authored-by: Konrad Abicht <hi@inspirito.de> * Clear memory to not influence other tests * Call gc_collect_cycles() after testGetTextPullRequest457() * Set a baseline memory level before memory usage test * use baseline as minimum memory as well * Update src/Smalot/PdfParser/Page.php Co-authored-by: Jérémy Benoist <j0k3r@users.noreply.github.com> Co-authored-by: Konrad Abicht <hi@inspirito.de> Co-authored-by: Kees Vaes <kees@superscanner.nl> Co-authored-by: Jérémy Benoist <j0k3r@users.noreply.github.com>

Nickmanbear added 4 commits September 3, 2021 21:18

Reset recursionStack when getting text

54af006

Don't pop the recursionStack during text parse

75d1966

Move recursionStack cleanup to after page getText()

78c3c8c

Missing semicolon

b91611d

k00ni added the enhancement label Sep 6, 2021

Add file containing containing loop for testing

c9caa7c

Add test for issue fixed by Pull Request 457

44d366f

k00ni reviewed Sep 6, 2021

View reviewed changes

tests/Integration/PageTest.php Show resolved Hide resolved

k00ni reviewed Sep 6, 2021

View reviewed changes

src/Smalot/PdfParser/Page.php Show resolved Hide resolved

Nickmanbear and others added 2 commits September 6, 2021 13:33

Update tests/Integration/PageTest.php

fea25ae

Co-authored-by: Konrad Abicht <hi@inspirito.de>

Add comment to recursionStack fix

e887f4c

k00ni reviewed Sep 6, 2021

View reviewed changes

src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved

Nickmanbear and others added 3 commits September 6, 2021 13:56

Update src/Smalot/PdfParser/Page.php

81c6c3a

Co-authored-by: Konrad Abicht <hi@inspirito.de>

Clear memory to not influence other tests

9c4bee2

Call gc_collect_cycles() after testGetTextPullRequest457()

abd987f

k00ni reviewed Sep 8, 2021

View reviewed changes

tests/Integration/PageTest.php Outdated Show resolved Hide resolved

Kees Vaes added 2 commits September 8, 2021 16:50

Set a baseline memory level before memory usage test

4286d18

use baseline as minimum memory as well

7cdd192

j0k3r reviewed Sep 22, 2021

View reviewed changes

src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved

Update src/Smalot/PdfParser/Page.php

d182bba

Co-authored-by: Jérémy Benoist <j0k3r@users.noreply.github.com>

k00ni approved these changes Sep 23, 2021

View reviewed changes

k00ni added fix and removed enhancement labels Sep 23, 2021

k00ni merged commit 1b3b6eb into smalot:master Sep 23, 2021

PrinsFrank mentioned this pull request Oct 15, 2021

Call for testers: Please test our latest release candidate v1.2.0-RC2 #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent endless loops during Page->getText() because of faulty recursionStack #457

Prevent endless loops during Page->getText() because of faulty recursionStack #457

Nickmanbear commented Sep 3, 2021 •

edited

Loading

amooij commented Sep 6, 2021

Nickmanbear commented Sep 6, 2021

k00ni commented Sep 6, 2021

Nickmanbear commented Sep 6, 2021

k00ni commented Sep 6, 2021

Nickmanbear commented Sep 10, 2021

k00ni commented Sep 10, 2021

k00ni commented Sep 20, 2021 •

edited

Loading

Nickmanbear commented Sep 22, 2021

k00ni left a comment •

edited

Loading

Prevent endless loops during Page->getText() because of faulty recursionStack #457

Prevent endless loops during Page->getText() because of faulty recursionStack #457

Conversation

Nickmanbear commented Sep 3, 2021 • edited Loading

amooij commented Sep 6, 2021

Nickmanbear commented Sep 6, 2021

k00ni commented Sep 6, 2021

Nickmanbear commented Sep 6, 2021

k00ni commented Sep 6, 2021

Nickmanbear commented Sep 10, 2021

k00ni commented Sep 10, 2021

k00ni commented Sep 20, 2021 • edited Loading

Nickmanbear commented Sep 22, 2021

k00ni left a comment • edited Loading

Choose a reason for hiding this comment

Nickmanbear commented Sep 3, 2021 •

edited

Loading

k00ni commented Sep 20, 2021 •

edited

Loading

k00ni left a comment •

edited

Loading