Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing text from PDF #137

Closed
nunofgs opened this issue Jul 14, 2019 · 3 comments
Closed

Parsing text from PDF #137

nunofgs opened this issue Jul 14, 2019 · 3 comments

Comments

@nunofgs
Copy link

nunofgs commented Jul 14, 2019

Hi @Hopding, thank you for the great lib.

Apologies if this is a newbie question, but I can't seem to find a way to parse text out of an existing PDF. I'm looking to retrieve a string from a PDF in order to determine which page it's on.

Any idea how I could accomplish this?

@dasilvacontin
Copy link

I'm personally looking to find some text and replace the "field"'s contents

@Hopding
Copy link
Owner

Hopding commented Jul 20, 2019

Hello @nunofgs!

It is not currently possible to parse plain text out of a document with pdf-lib (but you can extract the content of acroform fields). I'd suggest you consider using PDF.js to parse/extract text.

Of course, this isn't an ideal solution since it requires two different libraries for a seemingly simple task. But it's the best approach I know of for now, until pdf-lib gains support for text parsing.

@Hopding
Copy link
Owner

Hopding commented Jul 20, 2019

@dasilvacontin Is the field you are working with just plain text? Or is it an acroform field? If it is raw text, I'm afraid pdf-lib doesn't have the necessary features to parse it (but as I mentioned above, you could use PDF.js instead).

However, if it's in an acroform, pdf-lib should be able to do what you need. pdf-lib's acroform support isn't currently well documented, so I'd suggest taking a look at some of the existing acroform issues. Please let me know if you have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants