-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line breaks being lost between headers and paragraphs #608
Comments
I'm fairly certain that what's going on here is that the "macOS Pages' export function" is using the graphics state
You'll notice that all the If the document is re-saved in Adobe Acrobat as a reduced-size PDF, then the output above becomes:
Here you'll see that the Since PdfParser ignores |
@GreyWyvern thanks for investigating. So it seems that PDFParser is lacking the ability to process |
In this case the *I'm working on this, hopefully soon. :) |
Hi @robt-dice. Are we able to use your Example PDF.pdf in the PdfParser test suite? Is it free to use? It's actually good for at least a couple tests of mine. :) |
Please, feel free to use it.
… On 14 Aug 2023, at 14:25, Brian Huisman ***@***.***> wrote:
Hi @robt-dice <https://github.com/robt-dice>. Are we able to use your Example PDF.pdf <https://github.com/smalot/pdfparser/files/11848963/Example.PDF.pdf> in the PdfParser test suite? Is it free to use?
It's actually good for at least a couple tests of mine. :)
—
Reply to this email directly, view it on GitHub <#608 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A73UXSLRWSPKEYOZQCHOR6TXVIRM5ANCNFSM6AAAAAAZRS7VYQ>.
You are receiving this because you were mentioned.
|
Description:
Line breaks between headings and paragraph text are being lost. Also, the subsequent line breaks after each paragraph are being lost. The PDF was created using macOS Pages' export function, using the following settings:

PDF input
Example PDF.pdf
Expected output & actual output
expected.txt
actual.txt
Code
The text was updated successfully, but these errors were encountered: