Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing fields in acroform #134

Merged
merged 8 commits into from
May 21, 2024
Merged

Fix missing fields in acroform #134

merged 8 commits into from
May 21, 2024

Conversation

jimmoffet
Copy link
Contributor

@jimmoffet jimmoffet commented May 16, 2024

Working with low level objects isn't super fun in pdf-lib, haha, but I got there.

Ultimately, the offending pdf was poorly-authored (surprise) and some, but not all, widgets were added to the acroform catalog object. I'm guessing the author just gave up at the prospect of adding all those checkboxes to the acroform. Unfortunately, pdf-lib does not check for loose widgets that weren't added to the acroform object. PDFForm is essentially a 1:1 wrapper on acroform.

The solution was to iterate over all of the annotations with the widget type and add them to the doc's acroform manually. After that, we can grab the PDFForm and call .getFields() and get back those nice PDFField objects just like we did before.

This solution should generically capture any widgets missing from the acroform. I'm not sure if there will be a situation where there are widgets that the author intentionally kept outside of the acroform that we don't want in the acroform, but I can't think of one...

@jimmoffet jimmoffet changed the title Jim/acrofail Fix missing fields in acroform May 16, 2024
@jimmoffet jimmoffet marked this pull request as draft May 16, 2024 01:06
@jimmoffet jimmoffet requested a review from danielnaab May 16, 2024 01:06
type: 'not-supported',
name: 'CHARACTER IMAGE',
name: 'CHARACTER IMAGE.undefined',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge case, looking into it

@jimmoffet jimmoffet marked this pull request as ready for review May 16, 2024 02:38
Comment on lines +33 to +42
const pdfDoc = await PDFDocument.load(pdfBytes);
const widgets = await getWidgets(pdfDoc);

pdfDoc.catalog.set(
PDFName.of('AcroForm'),
pdfDoc.context.obj({
Fields: widgets.map(widget => pdfDoc.context.getObjectRef(widget)), // array of widget refs
})
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we add all of the widgets to the acroform, which will now get pulled into form via pdfDoc.getForm() below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so this is backfilling poorly annotated fields into the source PDF? That seems like an interesting approach. I like that it gives us a clear picture of the "error" of the source PDF, that we could perhaps inform the user about.

Comment on lines +16 to +29
// TODO: copied from pdf-lib acrofield internals, check if it's already exposed outside of acroform somewhere
export const getWidgets = async (pdfDoc: PDFDocument): Promise<PDFDict[]> => {
return pdfDoc.context
.enumerateIndirectObjects()
.map(([, obj]) => obj)
.filter(
obj =>
obj instanceof PDFDict &&
obj.get(PDFName.of('Type')) === PDFName.of('Annot') &&
obj.get(PDFName.of('Subtype')) === PDFName.of('Widget')
)
.map(obj => obj as PDFDict);
};

Copy link
Contributor Author

@jimmoffet jimmoffet May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful, so I'm exporting, but I don't feel strongly about it. It's ripped straight from pdf-lib, but seems to be only exposed on the acrofield object, which isn't useful in our case...

@danielnaab danielnaab merged commit c273122 into main May 21, 2024
2 checks passed
@danielnaab danielnaab deleted the jim/acrofail branch May 21, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants