-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing fields in acroform #134
Conversation
type: 'not-supported', | ||
name: 'CHARACTER IMAGE', | ||
name: 'CHARACTER IMAGE.undefined', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edge case, looking into it
const pdfDoc = await PDFDocument.load(pdfBytes); | ||
const widgets = await getWidgets(pdfDoc); | ||
|
||
pdfDoc.catalog.set( | ||
PDFName.of('AcroForm'), | ||
pdfDoc.context.obj({ | ||
Fields: widgets.map(widget => pdfDoc.context.getObjectRef(widget)), // array of widget refs | ||
}) | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we add all of the widgets to the acroform, which will now get pulled into form via pdfDoc.getForm()
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so this is backfilling poorly annotated fields into the source PDF? That seems like an interesting approach. I like that it gives us a clear picture of the "error" of the source PDF, that we could perhaps inform the user about.
// TODO: copied from pdf-lib acrofield internals, check if it's already exposed outside of acroform somewhere | ||
export const getWidgets = async (pdfDoc: PDFDocument): Promise<PDFDict[]> => { | ||
return pdfDoc.context | ||
.enumerateIndirectObjects() | ||
.map(([, obj]) => obj) | ||
.filter( | ||
obj => | ||
obj instanceof PDFDict && | ||
obj.get(PDFName.of('Type')) === PDFName.of('Annot') && | ||
obj.get(PDFName.of('Subtype')) === PDFName.of('Widget') | ||
) | ||
.map(obj => obj as PDFDict); | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems useful, so I'm exporting, but I don't feel strongly about it. It's ripped straight from pdf-lib, but seems to be only exposed on the acrofield object, which isn't useful in our case...
Working with low level objects isn't super fun in pdf-lib, haha, but I got there.
Ultimately, the offending pdf was poorly-authored (surprise) and some, but not all, widgets were added to the acroform catalog object. I'm guessing the author just gave up at the prospect of adding all those checkboxes to the acroform. Unfortunately, pdf-lib does not check for loose widgets that weren't added to the acroform object.
PDFForm
is essentially a 1:1 wrapper on acroform.The solution was to iterate over all of the annotations with the widget type and add them to the doc's acroform manually. After that, we can grab the
PDFForm
and call.getFields()
and get back those nicePDFField
objects just like we did before.This solution should generically capture any widgets missing from the acroform. I'm not sure if there will be a situation where there are widgets that the author intentionally kept outside of the acroform that we don't want in the acroform, but I can't think of one...