-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing coordinates in paragraphs continuation #1076
Conversation
I didn't see the problem in the previous PR, sorry ! |
Neither did I when I was developing it. The structure viewer app (https://structure-vision.streamlit.app/) is quite helpful in validating the stream order of PDF extraction. |
Still there are few paragraphs un-annotated it seems. I checked in a PDF. Any fixes? |
Oulx you please share an example? |
Please check with these articles: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10510434 ------> Article page no:9 And similarly many more articles |
Those issues are not related with this PR. Here the issue is that part of the text is misclassified as figure. I've referenced your comment in a separate issue. This will likely be solved, or, at least, mitigated by #963 (WIP). |
@lfoppiano Thanks looking forward for the fix |
When the paragraph continues after interruption (e.g., reference callout), the coordinates are lost:
This PR solves this issue.
This PR also adds a small modification in the frontend so that the paragraph coordinates are extracted if "add coordinates" is selected and "segment sentence" is not selected.