-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: min() arg is an empty sequence #53
Comments
I have run into same issue with my pdf and it is not a scanned document. I have checked all the bug fixes and the problem still persists. |
I had an similar error and found out that my pdf simulated empty LTFigures. These empty objects will cause your error, since l.x0, l.y0, l.x1 and l.y1 just don't exists, and therefore your mapping will be empty, i.e. min() arg is an empty sequence. I solved it by not adding empty LTFigures while constructing the elements of the pdf. You need to add a single if statement in function processor(m) of the package pdf_utils.py (pdftotree.utils.pdf.pdf_utils). See # ADD THIS.
|
Duplicate of #42 |
Describe the bug
When I run pdftotree on a PDF file, I get a runtime exception:
ValueError: min() arg is an empty sequence
.To Reproduce
Steps to reproduce the behavior:
Download this PDF file: performance-smart-networks.pdf
Execute the following code:
html = pdftotree.parse(pdf_file="performance-smart-networks.pdf", html_path=None, model_type=None, model_path=None, favor_figures=True, visualize=False)
Expected behavior
The variable
html
should contain the HTML mark-up with the text from the PDF.Error Logs/Screenshots
Here is the full error stack trace:
Environment (please complete the following information):
pdftotree
Version: v0.4.0The text was updated successfully, but these errors were encountered: