-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong coordinates of words when using function extract_words() #799
Comments
Hi @datdao1998, could you provide the PDF that you're using? Without it, it will be very difficult to diagnose your issue. |
Hi @datdao1998, just checking back on this. Are you able to provide the PDF? You might also try repairing the PDF and seeing if that fixes the problem you've encountered. |
Thanks, @sandzone. Please do email me the PDF; my email address is in my profile. And have you tried repairing the PDF? |
Thanks. You are correct. Repairing the pdf resolved the issue. However, ghostscript couldn't repair - i had to use poppler command line utilities for that. Is there a way to integrate pdf repair as a part of pdfplumber's extraction features? |
Description
When using function extract_words(), the coordinates of some extracted words are wrong, in my case word['x0'] = word['x1'] (but word['text'] still correct)
Code to reproduce the problem
Screenshots
Output
Visualize text box
Environment
The text was updated successfully, but these errors were encountered: