-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix converting path to multiple rectangles #371
Fix converting path to multiple rectangles #371
Conversation
For path that consists of a series of rectangles (shape is 'mlllhmlllh...'), call paint_path again with each group of 5 points. The result is multiple rects instead of a single curve. fixes pdfminer#369
8fdc37e
to
80dbb3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code changes look good!
I would like to have some unittests for this, instead of a PDF integration test.
A suggestion: could you add a reference in the docstring of paint_path()
to the Section 4.4 of the PDF Reference ("Path Construction and Painting") makes it easier for everyone to understand this method.
@cheungpat this is a friendly reminder that this PR still needs some work :) |
@cheungpat Thanks! |
Description
For path that consists of a series of rectangles (shape is 'mlllhmlllh...'), call paint_path again with each group of 5 points. The result is multiple rects instead of a single curve.
This fixes #369 which affects PDF generated from Excel and Microsoft Print to PDF. The PDF generated by this method will have borders drawn using a single path instead of multiple rectangles. This allows tables to be drawn as expected when using
pdf2txt -t html
and also when usingpdfplumber
(which does not consider rect edges from curves).Fixes #369
How Has This Been Tested?
Added a test case to parse such PDF and inspect the changes by calling
pdf2txt -t html <pdf>
before/after the change. Thepdf2txt -t xml
will also generate<rect>
instead of<curve>
Checklist