Fix missing fields in acroform #134

jimmoffet · 2024-05-16T00:01:31Z

Working with low level objects isn't super fun in pdf-lib, haha, but I got there.

Ultimately, the offending pdf was poorly-authored (surprise) and some, but not all, widgets were added to the acroform catalog object. I'm guessing the author just gave up at the prospect of adding all those checkboxes to the acroform. Unfortunately, pdf-lib does not check for loose widgets that weren't added to the acroform object. PDFForm is essentially a 1:1 wrapper on acroform.

The solution was to iterate over all of the annotations with the widget type and add them to the doc's acroform manually. After that, we can grab the PDFForm and call .getFields() and get back those nice PDFField objects just like we did before.

This solution should generically capture any widgets missing from the acroform. I'm not sure if there will be a situation where there are widgets that the author intentionally kept outside of the acroform that we don't want in the acroform, but I can't think of one...

packages/forms/src/documents/pdf/extract.ts

jimmoffet · 2024-05-16T02:35:05Z

packages/forms/src/documents/__tests__/fill-pdf.test.ts

        type: 'not-supported',
-        name: 'CHARACTER IMAGE',
+        name: 'CHARACTER IMAGE.undefined',


edge case, looking into it

jimmoffet · 2024-05-16T02:50:31Z

packages/forms/src/documents/pdf/extract.ts

+  const pdfDoc = await PDFDocument.load(pdfBytes);
+  const widgets = await getWidgets(pdfDoc);
+
+  pdfDoc.catalog.set(
+    PDFName.of('AcroForm'),
+    pdfDoc.context.obj({
+      Fields: widgets.map(widget => pdfDoc.context.getObjectRef(widget)), // array of widget refs
+    })
+  );
+


This is where we add all of the widgets to the acroform, which will now get pulled into form via pdfDoc.getForm() below

Interesting, so this is backfilling poorly annotated fields into the source PDF? That seems like an interesting approach. I like that it gives us a clear picture of the "error" of the source PDF, that we could perhaps inform the user about.

jimmoffet · 2024-05-16T02:51:38Z

packages/forms/src/documents/pdf/extract.ts

+// TODO: copied from pdf-lib acrofield internals, check if it's already exposed outside of acroform somewhere
+export const getWidgets = async (pdfDoc: PDFDocument): Promise<PDFDict[]> => {
+  return pdfDoc.context
+    .enumerateIndirectObjects()
+    .map(([, obj]) => obj)
+    .filter(
+      obj =>
+        obj instanceof PDFDict &&
+        obj.get(PDFName.of('Type')) === PDFName.of('Annot') &&
+        obj.get(PDFName.of('Subtype')) === PDFName.of('Widget')
+    )
+    .map(obj => obj as PDFDict);
+};
+


Seems useful, so I'm exporting, but I don't feel strongly about it. It's ripped straight from pdf-lib, but seems to be only exposed on the acrofield object, which isn't useful in our case...

jimmoffet added 2 commits May 15, 2024 16:59

iterate over widgets and add to acroform

95fbb4a

comments

d37a2ea

jimmoffet commented May 16, 2024

View reviewed changes

packages/forms/src/documents/pdf/extract.ts Outdated Show resolved Hide resolved

don't need to reload the doc

ab3984a

jimmoffet changed the title ~~Jim/acrofail~~ Fix missing fields in acroform May 16, 2024

jimmoffet marked this pull request as draft May 16, 2024 01:06

jimmoffet requested a review from danielnaab May 16, 2024 01:06

jimmoffet added 3 commits May 15, 2024 18:41

fix tests

4752db6

refactor a bit

284da65

unused imports

fd5cb2d

jimmoffet commented May 16, 2024

View reviewed changes

jimmoffet marked this pull request as ready for review May 16, 2024 02:38

comments

5cdf4a6

jimmoffet commented May 16, 2024

View reviewed changes

danielnaab approved these changes May 16, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into jim/acrofail

96832d0

danielnaab approved these changes May 21, 2024

View reviewed changes

danielnaab merged commit c273122 into main May 21, 2024
2 checks passed

danielnaab deleted the jim/acrofail branch May 21, 2024 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing fields in acroform #134

Fix missing fields in acroform #134

jimmoffet commented May 16, 2024 •

edited

Loading

jimmoffet May 16, 2024

jimmoffet May 16, 2024

danielnaab May 16, 2024

jimmoffet May 16, 2024 •

edited

Loading

Fix missing fields in acroform #134

Fix missing fields in acroform #134

Conversation

jimmoffet commented May 16, 2024 • edited Loading

jimmoffet May 16, 2024

Choose a reason for hiding this comment

jimmoffet May 16, 2024

Choose a reason for hiding this comment

danielnaab May 16, 2024

Choose a reason for hiding this comment

jimmoffet May 16, 2024 • edited Loading

Choose a reason for hiding this comment

jimmoffet commented May 16, 2024 •

edited

Loading

jimmoffet May 16, 2024 •

edited

Loading