Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294) #12361

Merged
merged 1 commit into from
Sep 14, 2020

Conversation

Snuffleupagus
Copy link
Collaborator

This patch contains a possible approach for fixing issue #12294, which compared to other PRs is purposely limited to the affected WidgetAnnotation code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix one specific case, I think that we should avoid modifying the Dict primitive[1] and/or avoid a solution that (indirectly) modifies an existing Dict-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for WidgetAnnotation, is to simply use a combination of the local and AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new WidgetAnnotation._getSaveFieldResources method will when necessary produce a combined resources Dict with only the minimum amount of data from the AcroForm /DR resources included.


[1] You want to avoid anything that could cause the general Dict implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing Dict-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. Dict-instances created during parsing are not expected to be changed.

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/dbd5385ba047c9e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3edc5b302b40b53/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/3edc5b302b40b53/output.txt

Total script time: 27.27 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/3edc5b302b40b53/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/dbd5385ba047c9e/output.txt

Total script time: 31.00 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/dbd5385ba047c9e/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/82f6e0a15419ea1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/82f6e0a15419ea1/output.txt

Total script time: 3.66 mins

Published

@timvandermeij
Copy link
Contributor

timvandermeij commented Sep 11, 2020

The functionality here not only works for saving, but also for printing, so could you add a reference test for the document from the issue so this code is covered by tests? (Bonus points if it can be reduced, but if that's difficult a linked test case will definitely suffice.)

Moreover, I entered "test2" in the "Full name" field of the document from the issue and printed it to a file. The result is here:
out.pdf

Is this expected? If I copy/paste some of the characters from the PDF file itself, I get:
out2.pdf

I do notice that some fonts are not embedded in the file, so maybe that explains the garbled characters here. Since I'm not familiar with this language, I'm not really sure how to check the correctness here. Do you have any suggestions?

@calixteman
Copy link
Contributor

@timvandermeij, I tested this patch on top of #12292 and it works well (print & save).
The patch in #12292 aims to fix the encoding issue you met.
Anyway r+ for me.

@timvandermeij
Copy link
Contributor

Alright, that's good to know. I think it would still be good to add a reference test in this PR to test the added logic, and it will give us confirmation that the other patch also works as expected since that should improve the reference test added here.

@Snuffleupagus
Copy link
Collaborator Author

Bonus points if it can be reduced, but if that's difficult a linked test case will definitely suffice.

While that'd have been technically possible, it would have meant a whole lot of hand-editing of PDF data to make it work. All-in-all, it really didn't seem worth the time/effort; hence a regular linked test-case is included here.

Furthermore, I'm not completely sure if the annotationStorage data used here makes sense and/or if it needs to be encoded differently. Given that the encoding issues will be handled elsewhere, I hope that we can just leave this as-is for now.

…`WidgetAnnotation`-instance (issue 12294)

This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/2789547aa2326d5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/b3b3ae0fa9ec7a4/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/2789547aa2326d5/output.txt

Total script time: 27.53 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/2789547aa2326d5/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/b3b3ae0fa9ec7a4/output.txt

Total script time: 29.91 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/b3b3ae0fa9ec7a4/reftest-analyzer.html#web=eq.log

@timvandermeij timvandermeij merged commit b0c7a74 into mozilla:master Sep 14, 2020
@timvandermeij
Copy link
Contributor

Nice work!

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/aa9a551266c0547/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/0102d3e4718d9a3/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/0102d3e4718d9a3/output.txt

Total script time: 25.69 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/aa9a551266c0547/output.txt

Total script time: 28.01 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the _getSaveFieldResources branch September 15, 2020 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants