Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't store complex data in PDFDocument.formInfo, and replace the fields object with a hasFields boolean instead #12483

Merged
merged 2 commits into from
Oct 16, 2020

Conversation

Snuffleupagus
Copy link
Collaborator

This patch is based on a couple of smaller things that I noticed when working on PR #12479.

  • Don't store the /Fields on the formInfo getter, since that feels like overloading it with unintended (and too complex) data, and utilize a hasFields boolean instead.
    This functionality was originally added in PR Improve AcroForm/XFA form type detection #12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of formInfo only consists of "simple" data.
    With these changes the fieldObjects getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a formInfo.hasFields check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.

  • Determine the hasFields property first, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the fieldObjects getter depends on it.

  • Simplify a loop in fieldObjects, since the object being accessed is a Map and those have built-in iteration support.

  • Use a higher logging level for errors in the formInfo getter, and include the actual error message, since that'd have helped with fixing PR Fix the "should get form info when AcroForm is present" unit-test #12479 a lot quicker.

  • Update the JSDoc comment in src/display/api.js to list the return values correctly, and also slightly extend/improve the description.

…fields` object with a `hasFields` boolean instead

*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*

 - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
   This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
   With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.

 - Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.

 - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.

 - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.

 - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
…ethod

If this method is ever passed invalid/unexpected data, or if during the course of parsing (since it's used recursively) such data is found, it will fail in a non-graceful way.
Hence this patch, which ensures that we don't attempt to access non-existent properties and also that errors such as the one fixed in PR 12479 wouldn't have occured.
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7e67a54fdbe408f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c33de2a33d093d1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/7e67a54fdbe408f/output.txt

Total script time: 4.20 mins

  • Unit Tests: FAILED

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/c33de2a33d093d1/output.txt

Total script time: 5.68 mins

  • Unit Tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/324e5d2b0c0ac40/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/218e81a0d1c189a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/324e5d2b0c0ac40/output.txt

Total script time: 4.26 mins

  • Unit Tests: FAILED

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/218e81a0d1c189a/output.txt

Total script time: 5.25 mins

  • Unit Tests: Passed

@timvandermeij timvandermeij merged commit 32bceae into mozilla:master Oct 16, 2020
@timvandermeij
Copy link
Contributor

Thank you for finetuning this!

@Snuffleupagus Snuffleupagus deleted the formInfo-hasFields branch October 16, 2020 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants