-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relative=True in page.extract_text() not working #391
Comments
Hi @LiutongZhou The issue you are facing is not necessarily a bug. The reason you are getting The bounding box |
Hi @samkit-jain , is the above statement still true even if I set I was expecting that Please help me understand it. Thanks |
Hi @LiutongZhou, I think you may be misunderstanding the units of a bounding box. |
Hi @jsvine. Thank you for your explanation. But I was hoping that I assume this is the intention for having this optional parameter Is my understanding wrong? |
Ah, now I better understand your question. Thank you for clarifying. Here is an explanation of the
|
Okay, this is confusing :D I would not be able to understand it if I hadn’t read the whole issue #245. So |
Yep, exactly! That's a great summary. |
hi, @jsvine ,When I extract the text in the specified area, there are images that will affect the accuracy of my extraction, and the spaces in the specified area will be removed. If I use Adobe Acrobat software to delete the picture and then extract the specified area, the extraction is normal. Please help, thank you very much |
The Bug
setting relative box coordinates in
crop
and then doextract_text
is not working. page.crop(box_coordinates, relative=True)Code to reproduce the problem
PDF file
https://s22.q4cdn.com/407748750/files/doc_financials/2020/ar/2020-Proxy-Card.pdf
Expected behavior
Return the text of the page
Actual behavior
Return nothing
Environment
The text was updated successfully, but these errors were encountered: