-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Add LzwCodec for encoding #2883
Conversation
can you clarify what you intend to do with the encoding? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2883 +/- ##
==========================================
+ Coverage 96.24% 96.27% +0.02%
==========================================
Files 51 52 +1
Lines 8625 8692 +67
Branches 1722 1734 +12
==========================================
+ Hits 8301 8368 +67
Misses 187 187
Partials 137 137 ☔ View full report in Codecov by Sentry. |
I want to have an easy way to check if the decoding does the right thing. This allows us to change the LZW implementation with the confidence that we don't break workflows. |
@Lucas-C Maybe the encoder is interesting for fpdf2? It wasn't in any discussion, but only mentioned in py-pdf/fpdf2#691 |
I would recommend to make the encoder roughly the same as the decoder, id est passing |
@stefan6419846 Done :-) 👍 |
Are we sure that we want to expose this as a public module while we do not officially support encoding objects with LZW? I tend to make |
Codecs might be kept private and called directly from filters. For the tests, as it currently is, you have proved that you have a function and the inverted function : I would have liked to have a minimum test that check also the compressed data |
Good point, I made it private. I'm uncertain about the module name, but as it is private it should not matter too much. |
@pubpub-zz You're absolutely right. I've added two examples. I would feel even better if there was documented (non-encoded, encoded) pairs that we could add to the test suite, but for now that should be fine. |
Yes, it could make for an interesting addition! I opened py-pdf/fpdf2#1271 to suggest this feature. Just to be clear @MartinThoma : are you explicitly allowing |
## What's new ### New Features (ENH) - Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001 ### Bug Fixes (BUG) - Fix font specificier for FreeText annotation (#2893) by @ssjkamei - Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei - Improve handling of spaces in text extraction (#2882) by @ssjkamei ### Robustness (ROB) - Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846 ### Documentation (DOC) - Use latest package versions (#2907) by @stefan6419846 - Correct example of reading FileAttachment annotation (#2906) by @j-t-1 ### Developer Experience (DEV) - Update pinned requirements (#2918) by @stefan6419846 - Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz ### Maintenance (MAINT) - Remove references to outdated Python versions (#2919) by @stefan6419846 - Generalize the method of obtaining space_code (#2891) by @ssjkamei - Unnecessary character mapping process (#2888) by @ssjkamei - New LZW decoding implementation (#2887) by @MartinThoma ### Testing (TST) - Add LzwCodec for encoding (#2883) by @MartinThoma ### Code Style (STY) - Capitalize error messages (#2903) by @j-t-1 - Modify error messages in PdfWriter (#2902) by @j-t-1 [Full Changelog](5.0.1...5.1.0)
This is a change I wanted to do for a while :-)
While we might only need decoding for pypdf, having both decoding and encoding in one class massively helps with testing. We can still get it wrong, but it's harder to get both the encoder and the decoder wrong in a consistent way.
This PR adds an abstract
Codec
class as well as an LzwCodec implementation.We could even use hypothesis for property-based testing for all codecs :-)