You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Excel and Print to PDF function, borders from the generated PDF cannot be converted to HTML output. When converting to XML output, the borders become <curve> instead of <rect>.
It appears that the borders are rendered as a single path and hence it is interpreted as a curve instead of a rect.
For path that consists of a series of rectangles
(shape is 'mlllhmlllh...'), call paint_path again with each group of
5 points. The result is multiple rects instead of a single curve.
fixespdfminer#369
* Fix converting path to multiple rectangles
For path that consists of a series of rectangles
(shape is 'mlllhmlllh...'), call paint_path again with each group of
5 points. The result is multiple rects instead of a single curve.
fixes#369
* Reduce pdf size by removing font
* Add unittest for PDFLayoutAnalyzer.paint_path()
* Add line to CHANGELOG.md
* Add reference to pdf reference manual
* Cleanup function paint_path a bit
* Reduce line length of tests
* Reduce line length of tests
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
Describe the bug
When using Excel and Print to PDF function, borders from the generated PDF cannot be converted to HTML output. When converting to XML output, the borders become
<curve>
instead of<rect>
.It appears that the borders are rendered as a single path and hence it is interpreted as a curve instead of a rect.
To Reproduce
Run
pdf2txt.py -t html output.pdf > output.html
.output.pdf
output.html: (some borders are missing)
When converting to xml, the borders become
<curve>
instead of<rect>
.Expected behavior
output.html: (borders should be shown)
The text was updated successfully, but these errors were encountered: