Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acro form fields blank in either downloaded or streamed PDF #488

Closed
faxemaxee opened this issue Jun 15, 2020 · 6 comments
Closed

acro form fields blank in either downloaded or streamed PDF #488

faxemaxee opened this issue Jun 15, 2020 · 6 comments

Comments

@faxemaxee
Copy link

Hey, we are using your great lib to fill form field in a pdf and sending it back to the user in two ways. First we are directly streaming the PDF to update a PDF Preview on the client and after finished the edit we persist the PDF and just deliver the PDF via GET Url.

We built a small helper to generate the filled PDF and return the bytes after we performed pdfDoc.save(). We use the exact same method in both cases but depending on wether we stream it directly or save the file we get either blank fields or filled fields. We had an old implementation – found somewhere in this repo – which only showed the filled fields when we persisted the PDF but not in preview mode. We then updated this implementation as stated in the Update to V1.X.X guide in the readme, this made the preview work but broke the persisted PDF. After that we started from scratch with the solution provided in #205 which seemed promising but again, only the preview worked, not the persisted PDF.

So we are kinda out of ideas here, maybe you have any further suggestions?

pdfhelper.js (only working as persisted PDF)

/* eslint-disable no-await-in-loop */
/** PDF helper function module
 * @module pdfhelper
 */

'use strict';

const { PDFName, PDFNumber, PDFString, PDFDict, PDFContentStream, drawLinesOfText, degrees, rgb } = require('pdf-lib');

const pdfhelper = {

    getAcroFields: function (pdfDoc) {
        if (!pdfDoc.catalog.get(PDFName.of('AcroForm'))) { return []; }
        const acroForm = pdfDoc.context.lookup(pdfDoc.catalog.get(PDFName.of('AcroForm')));
        if (!acroForm.get(PDFName.of('Fields'))) { return []; }
        const acroFields = acroForm.context.lookup(acroForm.get(PDFName.of('Fields')));
        return acroFields.array.map(ref => acroForm.context.lookup(ref));
    },

    findAcroFieldByName: function (pdfDoc, name) {
        const acroFields = this.getAcroFields(pdfDoc);
        return acroFields.find((acroField) => {
            const fieldName = acroField.get(PDFName.of('T'));
            return !!fieldName && fieldName.value === name;
        });
    },

    fillAcroTextField: function (
        pdfDoc,
        acroField,
        fontObject,
        text,
        fontSize = 15,
    ) {
        const fieldRect = acroField.get(PDFName.of('Rect'));
        const fieldWidth = fieldRect.get(2).number - fieldRect.get(0).number;
        const fieldHeight = fieldRect.get(3).number - fieldRect.get(1).number;

        const dict = fontObject.doc.context.obj({
            Type: 'XObject',
            Subtype: 'Form',
            FormType: 1,
            BBox: [0, 0, fieldWidth, fieldHeight],
            Resources: { Font: { F0: fontObject.ref } },
        });

        const appearanceStream = fontObject.doc.context.register(
            PDFContentStream.of(
                dict,
                drawLinesOfText(text.split('\n'), {
                    color: rgb(0, 0, 0),
                    font: fontObject.name,
                    size: fontSize,
                    rotate: degrees(0),
                    xSkew: degrees(0),
                    ySkew: degrees(0),
                    x: 0,
                    y: 0,
                    lineHeight: fontSize + 2,
                })
            ),
        );

        acroField.set(PDFName.of('V'), PDFString.of(text));
        acroField.set(PDFName.of('Ff'), PDFNumber.of(1 /* Read Only */));
        acroField.set(PDFName.of('AP'), acroField.context.obj({ N: appearanceStream }));
    },

    lockField: function (acroField) {
        acroField.set(PDFName.of('Ff'), PDFNumber.of(1 << 0 /* Read Only */));
    }
};

module.exports = pdfhelper;

helper.js (only working as persisted PDF)

const generatePDF = async function (language) {
    const pdfDoc = await PDFDocument.load(fs.readFileSync('./some/path/pdf_' + language + '.pdf'));
    pdfDoc.registerFontkit(fontkit);
    const fontObject = await pdfDoc.embedFont(StandardFonts.Helvetica);

    const fillInField = (fieldName, text, fontSize = 12) => {
        const field = pdfhelper.findAcroFieldByName(pdfDoc, fieldName);
        if (!field) {
            throw new Error(`Missing AcroField: ${fieldName}`);
        };
        pdfhelper.fillAcroTextField(pdfDoc, field, fontObject, text, fontSize);
    };

    const date = moment(new Date()).locale(util.convertLanguageForDate(language)).format(util.getDateFormat(language));
    fillInField('Principal_Place, Date', date);
    fillInField('Agent_Place, Date', date);

    fillInField('Principal', 'John Doe');
    fillInField('PrincipalRole', 'CTO');

    fillInField('Agent', 'Jane Doe');
    fillInField('AgentRole', 'Managing Director');

    fillInField(
        'Company',
        'SomeStreet 123' + '\n'
        + '54321 SomeCity' + '\n'
        + 'SomeCountry'
    );

    const pdfBytes = await pdfDoc.save();
    return pdfBytes;
}

routes.js (excerpt)

router.route('/:id/legal')
    .get(/* ... */, (async (req, res, next) => {
       
        /* ... */

        const pdfBytes = await helper.generatePDF(login, customer);
        res.attachment('some.pdf');
        res.contentType('application/pdf');
        res.send(Buffer.from(pdfBytes));
    }))
@lalaman
Copy link

lalaman commented Jun 15, 2020

Hey, I am not sure if this will help. But I had a similar issue this morning where after I ran pdfDoc.save() all the fields on the first page of my document disappeared.

So instead of using PDFString.of:

acroField.set(PDFName.of('V'), PDFString.of(text));

I replaced it with PDFHexString.fromText like below and it was fine.

acroField.set(PDFName.of('V'), PDFHexString.fromText(text.toString()));

May have been my text not really being text so I added .toString() to make sure it worked.

@faxemaxee
Copy link
Author

thanks you for your suggestion @lalaman will try it and let you know. In the meantime I have some more debug info. We're using pdfjs by mozilla to render the PDF in the frontend. This is throwing some warnings, an excerpt:

Warning: Skipping command M: expected 1 args, but received 0 args.
pdf.worker.js:1705 Warning: Unknown command "anaging".
pdf.worker.js:1705 Warning: Unknown command "Director".
pdf.worker.js:1705 Warning: Skipping command Tj: expected 1 args, but received 0 args.

We are trying to fill one field with "Managing Director" and something doesn't seem to work properly when adding the acroform field values, it kinda chops all of that into pieces. I'm not that familiar with all this PDF stuff... maybe @Hopding can use this info.

@faxemaxee
Copy link
Author

faxemaxee commented Jun 16, 2020

Okay, I found out even more info about this. I changed everything back to the implementation recommended in #205 (comment) because I had a hunch that it might not be the PDF itself but the different apps/previews which I used to view it. Turns out I wasn't completely wrong. Here is what is (not) working now:

  • the webpreview using mozilla PDF.js can display the filled form fields but gives two warnings one about missing the font one about an undefined function:
Warning: TT: undefined function: 32
pdf.worker.js:1705 Warning: Font "Courier" is not available -- attempting to fallback to a default font.
  • when downloading the PDF and opening it in Chrome or Adobe Acrobat the fields are also filled out correctly.
  • when opening the PDF in "Preview" on Mac it is blank
  • when sending the PDF via Microsoft Teams and opening the Webpreview in Teams, the filled field load but instantly fade away and result in a blank pdf
  • when sending the PDF via Slack, the little preview image for the pdf displays the filled field even in the correct font (the only client/preview which does it) but when opening the pdf in the Slack preview all field are blank again

I have not clue how and why this happens, I just felt like documenting this in it entierty would be helpful at some point.

We decided to go with the "industry standard" Adobe Reader. So for now we're good. I'd be happy to help with any further investigation on this topic.

@LorenzoSantoro94
Copy link

Same thing is happening to me but only if I copy the filled pages to a new document and then save the second one.
If i directly save the document that was directly read from file system, the filled form is correctly shown.
Also when I copy the pages, the acro fields are not available anymore in the second document, so I can't copy the pages and then fill the form.

@lalaman
Copy link

lalaman commented Jul 14, 2020

Unfortunately, this issue has been brought up many times in this repository as copyPages does not copy over acrofields from the original pages.

What I ended up doing was using a combination of pdf-lib to remove pages (since that maintains the acrofields) and then using pdf-merge to join the documents together. This way, the pdfs are merged and retain the acroform fields. After you join, you still have to use PDFDocument.load() to set needAppearances to true in order to see the text inside the acroform fields.

Here is an example:

const fs = require('fs');
const { PDFBool, PDFDocument, PDFName } = require('pdf-lib');
const PDFMerge = require('pdf-merge');

// Helper function for removing pages in PDF
const removePDFPages = async (doc, start, end) => {
  for (let i = end; i >= start; i -= 1) {
    doc.removePage(i);
  }
  const docBytes = await doc.save();
  const filepath = `${Date.now()}.pdf`;
  fs.writeFileSync(filepath, docBytes);
  return filepath;
};

// Join pages 1-10 of doc1 and 1-5 of doc2
const combinePDFs = async (doc1filepath, doc2filepath) => {
  let tempFilepaths = [];

  try {
    const doc1 = await PDFDocument.load(fs.readFileSync(doc1filepath));
    const doc1pages = doc1.getPages();

    const doc2 = await PDFDocument.load(fs.readFileSync(doc2filepath));
    const doc2pages = doc2.getPages();

    const temp1filepath = await removePDFPages(doc1, 10, doc1pages - 1);
    tempFilepaths.push(temp1filepath);

    const temp2filepath = await removePDFPages(doc2, 5, doc2pages - 1);
    tempFilepaths.push(temp2filepath);

    const mergedBuffer = await PDFMerge([
      temp1filepath,
      temp2filepath
    ]);

    // Set appearances to true so that your form fields will not
    // appear as white text
    const mergedDoc = await PDFDocument.load(mergedBuffer);
    const acroForm = mergedDoc.context.lookup(
      mergedDoc.catalog.get(PDFName.of('AcroForm')),
    );
    acroForm.set(PDFName.of('NeedAppearances'), PDFBool.True);

    const mergedDocBytes = await mergedDoc.save();
    const destination = `${Date.now()}-final.pdf`;
    fs.writeFileSync(destination, mergedDocBytes);

    return destination;
  } catch (err) {
    console.log('Unable to combine PDFs', err);
    throw new Error(err);
  } finally {
    // Delete temp files
    for (const path of tempFilepaths) {
      if (fs.existsSync(path)) {
        fs.unlinkSync(path);
      }
    }
  }
};

@Hopding
Copy link
Owner

Hopding commented Sep 20, 2020

Hello @faxemaxee @lalaman @freirg! pdf-lib has a new forms API that should solve this problem. See the README and API docs for details.

(Note that the new forms API does not allow forms to be copied from other documents. That's still an issue. See #218 and related issues for details.)

@Hopding Hopding closed this as completed Sep 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants