BUG: encode_pdfdocencoding() always returns bytes #2440

sbourlon · 2024-02-05T20:57:49Z

In the function encode_pdfdocencoding, cast its return value from bytearray to bytes to match its function signature.

This casting is necessary because bytearray is duck type compatible with bytes in mypy, however this library expects only bytes in its Encryption class.

sbourlon · 2024-02-05T21:54:02Z

I don't have any Windows machine to troubleshoot the test 🤷

stefan6419846 · 2024-02-06T08:27:27Z

Thanks for the PR. Are you able to add a corresponding test here as well?

sbourlon · 2024-02-06T08:55:43Z

I can add a test function that verifies that encode_pdfdocencoding() always returns bytes.

Would it be enough?

stefan6419846 · 2024-02-06T09:24:47Z

Yes, this should be sufficient in my opinion.

codecov · 2024-02-06T20:08:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3fb63f7) 94.43% compared to head (8299131) 94.43%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2440   +/-   ##
=======================================
  Coverage   94.43%   94.43%           
=======================================
  Files          49       49           
  Lines        8008     8008           
  Branches     1616     1616           
=======================================
  Hits         7562     7562           
  Misses        276      276           
  Partials      170      170

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/test_generic.py

In the function encode_pdfdocencoding, cast its return value from bytearray to bytes to match its function signature. This casting is necessary because bytearray is duck type compatible with bytes in mypy, however this library expects only bytes in its Encryption class.

sbourlon · 2024-02-07T09:07:54Z

I think we are good now.

BTW, I noticed that the tests test_iss1723 and test_iss1767 have the same test file name but with different URLs. Can this create a conflict? See https://github.com/py-pdf/pypdf/blob/main/tests/test_writer.py#L1405-L1423

stefan6419846 · 2024-02-07T09:13:19Z

The test file name in test_iss1767 seems to be wrong and should be fixed. Looking at the logs, it seems like the issue arises from the fact that the filenames are identical and thus it depends on which test we execute first and thus which file gets cached.

For this reason, we probably do not need the initial download, but it should be sufficient to just use different/correct target names.

sbourlon · 2024-02-07T09:28:37Z

What about naming the file test_iss1767.pdf to be consistent with the naming pattern of test_iss1723?

stefan6419846 · 2024-02-07T09:29:30Z

This is what I would propose as well.

sbourlon · 2024-02-07T19:27:02Z

Sent a separate PR to fix the naming conflict: #2445

sbourlon · 2024-02-07T20:16:39Z

Is this PR good for approval?

tests/example_files.yaml

tests/test_writer.py

stefan6419846

LGTM.

MartinThoma · 2024-02-20T20:14:19Z

Nice! Thanks for taking care of it - and nice to see that this process works technically and in practice 🎉 Made my day :-)

sbourlon · 2024-02-20T20:23:33Z

You are very welcome. Thank you for maintaining this library.

stefan6419846 · 2024-02-21T07:46:59Z

@MartinThoma It seems like your message has been a little bit misleading, but yes, the recent changes with the write permissions have worked 👍

@pubpub-zz

## What's new Generating name objects (`NameObject`) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a *DeprecationWarning* for now. ### New Features (ENH) - Add get_pages_from_field (#2494) by @pubpub-zz - Add reattach_fields function (#2480) by @pubpub-zz - Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz ### Bug Fixes (BUG) - Missing error on name without leading / (#2387) by @Rak424 - encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon - BI in text content identified as image tag (#2459) by @pubpub-zz ### Robustness (ROB) - Missing basefont entry in type 3 font (#2469) by @pubpub-zz ### Documentation (DOC) - Improve lossless compression example (#2488) by @j-t-1 - Amend robustness documentation (#2479) by @j-t-1 ### Developer Experience (DEV) - Fix changelog for UTF-8 characters (#2462) by @stefan6419846 ### Maintenance (MAINT) - Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz - Remove user assignment for feature requests (#2483) by @stefan6419846 - Remove reference to old 2.0.0 branch (#2482) by @stefan6419846 ### Testing (TST) - Fix benchmark failures (#2481) by @stefan6419846 - Broken test due to expired test file URL (#2468) by @pubpub-zz - Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon [Full Changelog](4.0.2...4.1.0)

@pubpub-zz

## What's new Generating name objects (`NameObject`) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a *DeprecationWarning* for now. ### New Features (ENH) - Add get_pages_from_field (#2494) by @pubpub-zz - Add reattach_fields function (#2480) by @pubpub-zz - Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz ### Bug Fixes (BUG) - Missing error on name without leading / (#2387) by @Rak424 - encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon - BI in text content identified as image tag (#2459) by @pubpub-zz ### Robustness (ROB) - Missing basefont entry in type 3 font (#2469) by @pubpub-zz ### Documentation (DOC) - Improve lossless compression example (#2488) by @j-t-1 - Amend robustness documentation (#2479) by @j-t-1 ### Developer Experience (DEV) - Fix changelog for UTF-8 characters (#2462) by @stefan6419846 ### Maintenance (MAINT) - Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz - Remove user assignment for feature requests (#2483) by @stefan6419846 - Remove reference to old 2.0.0 branch (#2482) by @stefan6419846 ### Testing (TST) - Fix benchmark failures (#2481) by @stefan6419846 - Broken test due to expired test file URL (#2468) by @pubpub-zz - Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon [Full Changelog](4.0.2...4.1.0)

sbourlon mentioned this pull request Feb 5, 2024

generic/_utils.py: function create_string_object not working with the bytearray type #2434

Closed

sbourlon force-pushed the fix-base-func-encode_pdfdocencoding branch from 095ac87 to e94f369 Compare February 6, 2024 19:05

stefan6419846 reviewed Feb 7, 2024

View reviewed changes

tests/test_generic.py Outdated Show resolved Hide resolved

sbourlon changed the title ~~BUG: encode_pdfdocencoding() always returns bytes~~ [WIP] BUG: encode_pdfdocencoding() always returns bytes Feb 7, 2024

sbourlon changed the title ~~[WIP] BUG: encode_pdfdocencoding() always returns bytes~~ BUG: encode_pdfdocencoding() always returns bytes Feb 7, 2024

sbourlon force-pushed the fix-base-func-encode_pdfdocencoding branch from 404c0f6 to e3f1680 Compare February 7, 2024 08:53

sbourlon force-pushed the fix-base-func-encode_pdfdocencoding branch from e3f1680 to 25d5c93 Compare February 7, 2024 08:57

This was referenced Feb 7, 2024

Fix test_iss1723 #2441

Closed

Test test_iss1723 is broken #2442

Closed

sbourlon mentioned this pull request Feb 7, 2024

TST: resolve file naming conflict in test_iss1767 #2445

Merged

stefan6419846 reviewed Feb 8, 2024

View reviewed changes

tests/example_files.yaml Outdated Show resolved Hide resolved

stefan6419846 reviewed Feb 8, 2024

View reviewed changes

tests/test_writer.py Outdated Show resolved Hide resolved

sbourlon force-pushed the fix-base-func-encode_pdfdocencoding branch from 25d5c93 to 8299131 Compare February 9, 2024 19:03

stefan6419846 approved these changes Feb 10, 2024

View reviewed changes

pubpub-zz approved these changes Feb 12, 2024

View reviewed changes

stefan6419846 merged commit efe7790 into py-pdf:main Feb 20, 2024
15 checks passed

biredel mentioned this pull request May 8, 2024

ENH: Extract Text Enhancement (whitespaces) #1084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: encode_pdfdocencoding() always returns bytes #2440

BUG: encode_pdfdocencoding() always returns bytes #2440

sbourlon commented Feb 5, 2024 •

edited

Loading

sbourlon commented Feb 5, 2024

stefan6419846 commented Feb 6, 2024

sbourlon commented Feb 6, 2024

stefan6419846 commented Feb 6, 2024

codecov bot commented Feb 6, 2024 •

edited

Loading

sbourlon commented Feb 7, 2024

stefan6419846 commented Feb 7, 2024

sbourlon commented Feb 7, 2024

stefan6419846 commented Feb 7, 2024

sbourlon commented Feb 7, 2024

sbourlon commented Feb 7, 2024

stefan6419846 left a comment

MartinThoma commented Feb 20, 2024

sbourlon commented Feb 20, 2024

stefan6419846 commented Feb 21, 2024

BUG: encode_pdfdocencoding() always returns bytes #2440

BUG: encode_pdfdocencoding() always returns bytes #2440

Conversation

sbourlon commented Feb 5, 2024 • edited Loading

sbourlon commented Feb 5, 2024

stefan6419846 commented Feb 6, 2024

sbourlon commented Feb 6, 2024

stefan6419846 commented Feb 6, 2024

codecov bot commented Feb 6, 2024 • edited Loading

Codecov Report

sbourlon commented Feb 7, 2024

stefan6419846 commented Feb 7, 2024

sbourlon commented Feb 7, 2024

stefan6419846 commented Feb 7, 2024

sbourlon commented Feb 7, 2024

sbourlon commented Feb 7, 2024

stefan6419846 left a comment

Choose a reason for hiding this comment

MartinThoma commented Feb 20, 2024

sbourlon commented Feb 20, 2024

stefan6419846 commented Feb 21, 2024

sbourlon commented Feb 5, 2024 •

edited

Loading

codecov bot commented Feb 6, 2024 •

edited

Loading