-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any way to include blank lines when extracting texts? #516
Comments
Hi! Something like this has been a longstanding request — see, e.g., #10 from 2016. I think it's probably time to really try adding this feature! Or at least something useful enough, if not perfect. Thanks for the nudge. In the meantime, there are a few ways you could handle this, though the best approach will depend on your specific PDF. One approach you might try:
Closing this issue due to the similarity to #10, but feel free to continue the discussion here. |
doctop_clusters: what value of x_tolerance=3 and y_tolerance=3 ?? i need to put for export blank line. |
@flycattt Try using |
That's a very old version of |
Hi, I am using this fabulous library to extract texts from PDFs. In my PDFs, records are separated by blank lines. However, after extracting, I only get one line break as marked in the screenshot. It would get me into trouble parsing the records cuz some records starts without a pattern string. I looked through the manual but didn't find a solution. Much appreciated if you could help me with it!
The text was updated successfully, but these errors were encountered: