-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra \n characters #93
Comments
Hola David, thanks for the report. Are we looking at the same PDF? The one you linked to does not have "artículo 378.7" anywhere in the document. |
Hi James, I'm sorry the pdf is http://boe.es/borme/dias/2011/08/23/pdfs/BORME-B-2011-160-28.pdf Thanks again. |
OK, I can reproduce it here. It's not ideal - the layout algorithms in lib/pdf/reader/page_layout.rb could definitely be improved. One idea might be to detect "blocks" of text that appear to be be close together vertically and render them as one. I'm pressed for time at the moment so probably can't look into it much for now, but I'd happily accept any pull requests for review. |
Hi James, I think the problem is the line:
Should be:
** Be careful!! I'm a beginner in Ruby
I think the problem is the method interesting_rows receive an invalid element (an empty element) in page parameter (and then join with "\n"). Right? |
Hi,
I'm extracting text (I'm not the author of the pdf) from
http://boe.es/borme/dias/2011/08/23/pdfs/BORME-B-2011-160-28.pdf
In the first page, first line (don't count the titles), appears twice the character '\n', and I think it must appears only one. Let me show you the output:
I mean the characters '\n' in the middle of the string.
ruby 1.9.3p0
pdf-reader 1.3.3
Thanks and nice job!
The text was updated successfully, but these errors were encountered: