-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added in checks for sprurious lines in malformed PDFs #689
Merged
pietermarsman
merged 14 commits into
pdfminer:develop
from
jwyawney:449-remove-empty-lines
Feb 22, 2022
Merged
Added in checks for sprurious lines in malformed PDFs #689
pietermarsman
merged 14 commits into
pdfminer:develop
from
jwyawney:449-remove-empty-lines
Feb 22, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…r new line characters
@jwyawney I simplified the code a bit by moving the empty check to the already exisiting I think that has the same effect. Can you check if this achieves the same thing? |
pietermarsman
approved these changes
Feb 8, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request fixes issue #449. I had a similar issue with PDFs containing lines with only spaces which would group with adjacent lines based on the layout thresholds. The issue required checking of newlines in other lines within the LTTextLineHorizontal class but also exclusion of these spurious lines in the LTLayoutContainer class.
I have tested this against my own files which were causing problematic errors but have also added in the contributed samples of malformed PDFs and the tests/test_malformed.py unittest file. Please run this from the top-level directory using:
Checklist