Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When converting to text the first line don't follow layout when enabled #612

Closed
3 tasks done
wendy0402 opened this issue Dec 20, 2024 · 1 comment · Fixed by #615
Closed
3 tasks done

When converting to text the first line don't follow layout when enabled #612

wendy0402 opened this issue Dec 20, 2024 · 1 comment · Fixed by #615
Labels
bug Something isn't working

Comments

@wendy0402
Copy link

Prerequisites

  • I have written a descriptive issue title

  • I have searched existing issues to ensure it has not already been reported

  • I agree to follow the Code of Conduct that this project adheres to

API/app/plugin version

7.2.2

Node.js version

20.14.0

Operating system

macOS

Operating system version (i.e. 20.04, 11.3, 10)

Sonoma(14.6.1)

Description

First of all thank you for this awesome library!

When I convert pdf to text while maintain the layout, I realise the first line of the page disrespect the layout. Seems because you trim the poppler output after receiving the response form poppler https://github.com/Fdawgs/node-poppler/blob/main/src/index.js#L1533

Steps to Reproduce

TextAlignCenter.pdf
the result of parsing the pdf by using poppler directly on command line

         WALDEN

        BY
HENRY DAVID THOREAU




              Here we have
         some centered text lines
          with background color
 "fillc:#3277d3, bgcol:#beded9, rot:0"

/// truncated because too long

from

const { Poppler } = require("node-poppler");
const pdf = new Poppler();
const output = await this.poppler.pdfToText(file, undefined, { maintainLayout: true });

output

WALDEN

        BY
HENRY DAVID THOREAU




              Here we have
         some centered text lines
          with background color
 "fillc:#3277d3, bgcol:#beded9, rot:0"




                 1854
                                      94
// truncated because too long
````

### Expected Behaviour

expect the result to be:
     WALDEN

    BY

HENRY DAVID THOREAU

          Here we have
     some centered text lines
      with background color

"fillc:#3277d3, bgcol:#beded9, rot:0"

/// truncated because too long

@wendy0402 wendy0402 added the bug Something isn't working label Dec 20, 2024
@Fdawgs
Copy link
Owner

Fdawgs commented Dec 23, 2024

Good spot @wendy0402, thanks for raising this! I'll take a look after the holidays.

@Fdawgs Fdawgs linked a pull request Jan 13, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants