BUG: Non-deterministic accidental object reuse #1995

sjoerdjob · 2023-07-21T07:47:32Z

The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead.

writer = PdfWriter()

reader1 = PdfReader(some_file)
id_reader1 = id(reader1)
writer.add_page(reader1.pages[0])
del reader1

reader2 = PdfReader(other_file)
id_reader2 = id(reader2)
writer.add_page(reader2.pages[0])
del reader2

writer.write(third_file)

because the reader1 is no longer in memory when reader2 gets initialized, the area in memory is free, so id_reader1 and id_reader2 might end up having the same value.

Due to PyPDF using id(reader) internally for an object-cache, it sometimes happened that writer.add_page(reader2.pages[0]) would result in duplicating reader1.pages[0] instead.

fixes #1788 .

The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead. ``` writer = PdfWriter() reader1 = PdfReader(some_file) id_reader1 = id(reader1) writer.add_page(reader1.pages[0]) del reader1 reader2 = PdfReader(other_file) id_reader2 = id(reader2) writer.add_page(reader2.pages[0]) del reader2 writer.write(third_file) ``` because the `reader1` is no longer in memory when `reader2` gets initialized, the area in memory is free, so `id_reader1` and `id_reader2` might end up having the same value. Due to PyPDF using `id(reader)` internally for an object-cache, it sometimes happened that `writer.add_page(reader2.pages[0])` would result in duplicating `reader1.pages[0]` instead.

Using a WeakKeyDictionary was an implementation detail that does not have to be matched in the type.

MartinThoma · 2023-08-13T07:19:34Z

@sjoerdjob Your PR cannot be merged like this as the CI fails:

pypdf/_protocols.py:68: error: Name "WeakKeyDictionary" is not defined

There are now also a couple of merge conflicts.

pubpub-zz · 2023-08-13T07:38:34Z

@MartinThoma this looks similar (different approach) to a PR you've already about random error. not sure this is still required

MartinThoma · 2023-10-08T09:48:09Z

#1788 was fixed via #1841. test_merging_many_temporary_files succeeds as well.

Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

#2244) Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

MartinThoma · 2023-10-08T09:54:22Z

We didn't have a test that could detect this type of issue, hence I added the test from this PR via #2244

@sjoerdjob Sorry that it took so long. I'm closing this PR now as the problem was fixed via #1841 and your test was added via #2244

MartinThoma · 2023-10-08T09:54:47Z

Thank you for your support 🤗 If you want, I'll add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)

sjoerdjob added 3 commits July 21, 2023 09:36

Change type to more plain type.

ad9ad60

Using a WeakKeyDictionary was an implementation detail that does not have to be matched in the type.

Switch to stringly-typed type.

5e21882

MartinThoma added a commit that referenced this pull request Oct 8, 2023

TST: Regression test against non-deterministic accidental object reuse

6641bda

Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

MartinThoma mentioned this pull request Oct 8, 2023

TST: Regression test against non-deterministic accidental object reuse #2244

Merged

MartinThoma added a commit that referenced this pull request Oct 8, 2023

TST: Regression test against non-deterministic accidental object reuse (

126f6be

#2244) Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

MartinThoma closed this Oct 8, 2023

biredel mentioned this pull request Mar 29, 2024

BUG: Ambiguous translated references #2558

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Non-deterministic accidental object reuse #1995

BUG: Non-deterministic accidental object reuse #1995

sjoerdjob commented Jul 21, 2023 •

edited by MartinThoma

Loading

MartinThoma commented Aug 13, 2023

pubpub-zz commented Aug 13, 2023

MartinThoma commented Oct 8, 2023

MartinThoma commented Oct 8, 2023

MartinThoma commented Oct 8, 2023

BUG: Non-deterministic accidental object reuse #1995

BUG: Non-deterministic accidental object reuse #1995

Conversation

sjoerdjob commented Jul 21, 2023 • edited by MartinThoma Loading

MartinThoma commented Aug 13, 2023

pubpub-zz commented Aug 13, 2023

MartinThoma commented Oct 8, 2023

MartinThoma commented Oct 8, 2023

MartinThoma commented Oct 8, 2023

sjoerdjob commented Jul 21, 2023 •

edited by MartinThoma

Loading