[Feature Request]: Use PDF as a template, similar to mail merge in word #177

Misiu · 2019-09-02T12:44:17Z

I'm looking for a client-side library that will allow me to create a preview of a document for a client.
Idea is to create a simple HTML form with couple of inputs and a preview button that will generate a pdf based on a template.
I'm aware that I can load an existing PDF and add an overlay to it (as shown in samples), but I'd like to replace text, for example, I'd like to replace {{name}} with John and {{surname}} with Smith.

I've searched over the issues and found #33 and #137, as I understand Your library doesn't support reading the text, so please consider this as a feature request.
With this one feature, Your library would be an ideal solution for client-side pdf manipulation.

The text was updated successfully, but these errors were encountered:

jtraulle · 2019-09-18T20:58:18Z

I am adding my vote here as this is something that could be really handy to my use case ;)

Hopding · 2019-12-26T22:04:56Z

Hello @Misiu! This is an interesting idea. I can certainly see its utility.

There is currently work being done by @cshenks to develop an AcroForms API (see here). When this is complete, it should be fairly straightforward to parse and modify the content of AcroForm text fields with pdf-lib. This seems pretty much like what you are describing here. Do you think this would address your use case?

@jtraulle @gustavodipietrodeus @DaveLo Since you've all expressed interest in this, I'd very much like to hear your thoughts as well.

DaveLo · 2019-12-27T13:44:44Z

@Hopding , I'm interested in this functionality for use in variable data printing, at my company we send a customized instruction booklet to customers.

Our current solution converts HTML to PDF , but we are reaching the point where development is constraining our design team since every change means a lengthy rebuild.

Using this library there are places where I can easily put dynamic objects in blank areas (images, barcodes, etc), but other places where having string interpolation would be hugely helpful (Hello, {{name}} welcome to {{service}})

Misiu · 2019-12-31T11:25:21Z

@Hopding AcroForms will be useful, but I support @DaveLo idea. I'd like to put placeholders in PDF and replace it with content.
As I think about this right now it won't be that easy. If the placeholder will be replaced with longer content, the whole text must reformat (some part might be moved to a new line).

jtraulle · 2019-12-31T14:54:20Z

I think, like @DaveLo and @Misiu that AcroForms does not fulfill the same purpose as placeholder and placeholders will be more appropriate in my usecase (being able to search and replace on the client side some markers/placeholders that has been placed onto generated PDF during server side rendering of the PDF) 😉

Hopding · 2019-12-31T18:06:16Z

Implementing this feature without using AcroForms presents three main challenges:

Locating the placeholders. This requires pdf-lib to sift through all the content streams in a document and locate all the text drawing operators. This wouldn't be too difficult to do. The challenging part is mapping the glyph IDs to unicode text. This would be a significant undertaking. The PDF specification defines a ridiculous number of ways to store fonts and encode text. Writing code to support all of them is entirely possible to do. It would just take a lot of time and effort. The final step in this process is to process all the unicode text and produce a list of words/sentences/paragraphs in the document. You might think this last step would be simple, but it is not. PDF does not store text in a structured format like HTML. It just says to draw characters at X/Y coordinates. So you'd need to convert these spatial coordinates to structured text.
Encoding the replacement text. Presumably, you'd want this feature to automatically draw the replacement text in the same font as the placeholders. This is also much harder than you might expect. For example, the font the placeholders were drawn in might have been subsetted, meaning it might not support the replacement text. And even if it does, you'd need to extract all the font objects for the placeholder font and figure out how to encode the new text (because, again, the PDF spec allows all sorts of fonts and encodings).
Laying out new text block. As @Misiu mentioned, it's highly unlikely that the replacement text will have the same length as the placeholder text. This means that you'd need to handle laying out the text already present on the document, not just the placeholders. And not necessarily just the sentence of paragraph to which the placeholders belonged. If the replacement text it long enough, it might require other paragraphs to be relaid out. And what happens if you end up exceeding the page length? And this is assuming your dealing with simple paragraphs of text. Many PDF documents have all sorts of fancy images and complicated layouts that would be extremely difficult to identify and handle automatically.

There are some shortcuts that could be taken if we placed some restrictions on the feature. For example, we could make (1) much easier if we required the placeholder text to be tagged with marked content operators (see section 14.6 MarkedContent of the PDF spec). But this would require the placeholders to be created in a special way, so it wouldn't be able to identify arbitrary strings of text like {{foo}}.

We could make (2) much easier as well, if we didn't try to automatically extract and reuse the font that the placeholders were drawn in. This step would be fairly straightforward if we required you to embed/provide your own font, just like you'd do for PDFPage.drawText.

But as for (3), I'm not too sure what could be done to simplify this. I'm open to ideas though! I'm sure other PDF libraries (such as iText or PDFBox) support text extraction and replacement in some form/fashion. So it'd be interesting to see how they handle this part.

DaveLo · 2020-01-02T13:55:40Z

@Hopding , I think restrictions make a ton of sense here. In general forcing tradeoffs for VDP style usage is reasonable, if the user needs full customization then drawing text on the page in an empty block is a better already available solution than variable interpolation.

I'm probably not knowledgeable enough to speak on 1 very well, but would it make sense to define the whole text block as the marked content and then once you pull the string out use interpolation on the variable pieces?
This is a perfectly reasonable ask, if you are automating document creation you probably have the font somewhere accessible.
This might be unworkable, but what if you forced a same or fewer character limit for the substitution? At least to start this limits the complexity so that worrying about page overflow or interacting with existing image layouts since you'd consume the same or less space with the text.

kevin8479 · 2020-02-28T11:31:20Z

Hi Guys is this feature still in the works?

Hopding · 2020-03-01T14:28:57Z

@kevin8479 This is not something I am actively working on. There are a number of other features that have much higher demand that I'd need to finish before turning to this. But as always, if any enterprising individual would like to try implementing this themselves, I'm happy to provide advise and answer questions!

fabioselau077 · 2020-09-06T00:15:00Z

same issue.
news updates about this?

github-actions · 2021-09-20T22:08:33Z

This issue is stale because it has been open 2 weeks with no activity. It will be closed in 2 days unless there is new activity. See MAINTAINERSHIP.md#issues for details.

Misiu · 2021-09-21T10:44:40Z

This is still a valid feature request. Maybe the state bot can leave alone the issues with the `feature-request label?

Hopding · 2021-09-22T00:12:56Z

Hey @Misiu 👋! I'm revamping how issues/discussions are handled on this repo (see MAINTAINERSHIP.md#issues for details). Going forward issues will only be kept open for long periods of time if they have a clear path to implementation or somebody is actively working on them (or they're high-impact bugs).

This is definitely still a valid feature request, but it's been opened for 2 years now with no clear path forward. Since nobody is working on it (or likely to be anytime soon), I don't think it needs to be tracked as an open issue anymore. However, since there's been some good discussion in this thread, I've added it to #998 so it doesn't get totally buried.

Hopding · 2021-09-24T22:59:23Z

Closing this as its status is now being tracked on the roadmap.

Hopding added the enhancement label Dec 26, 2019

Hopding added feature-request and removed enhancement labels Jan 1, 2020

Hopding changed the title ~~[Feature request] Use PDF as a template, similar to mail merge in word~~ [Feature Request]: Use PDF as a template, similar to mail merge in word Jan 1, 2020

Hopding mentioned this issue Mar 7, 2020

Replace the existing text from pdf file #374

Closed

ChuckJonas mentioned this issue Jun 6, 2020

Possible solution for "merge template"? #474

Closed

Hopding mentioned this issue Sep 16, 2020

replace texts in original pdf #590

Closed

se181018 mentioned this issue Mar 12, 2021

[FEATURE] Get all texts/images/shapes/SVGs from PDF page #581

Closed

bcholmes mentioned this issue Sep 4, 2021

can I edit text in pdf with PDF-lib? #950

Closed

github-actions bot added the stale label Sep 20, 2021

github-actions bot removed the stale label Sep 22, 2021

This was referenced Sep 22, 2021

[Feature Request] API to sanitize documents #433

Closed

Is there any way to extract text from a PDF using pdf-lib library i.e. using x , y coordinates #892

Closed

Hopding closed this as completed Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Use PDF as a template, similar to mail merge in word #177

[Feature Request]: Use PDF as a template, similar to mail merge in word #177

Misiu commented Sep 2, 2019

jtraulle commented Sep 18, 2019

Hopding commented Dec 26, 2019

DaveLo commented Dec 27, 2019

Misiu commented Dec 31, 2019

jtraulle commented Dec 31, 2019

Hopding commented Dec 31, 2019

DaveLo commented Jan 2, 2020

kevin8479 commented Feb 28, 2020

Hopding commented Mar 1, 2020

fabioselau077 commented Sep 6, 2020

github-actions bot commented Sep 20, 2021

Misiu commented Sep 21, 2021

Hopding commented Sep 22, 2021

Hopding commented Sep 24, 2021

[Feature Request]: Use PDF as a template, similar to mail merge in word #177

[Feature Request]: Use PDF as a template, similar to mail merge in word #177

Comments

Misiu commented Sep 2, 2019

jtraulle commented Sep 18, 2019

Hopding commented Dec 26, 2019

DaveLo commented Dec 27, 2019

Misiu commented Dec 31, 2019

jtraulle commented Dec 31, 2019

Hopding commented Dec 31, 2019

DaveLo commented Jan 2, 2020

kevin8479 commented Feb 28, 2020

Hopding commented Mar 1, 2020

fabioselau077 commented Sep 6, 2020

github-actions bot commented Sep 20, 2021

Misiu commented Sep 21, 2021

Hopding commented Sep 22, 2021

Hopding commented Sep 24, 2021