Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Change Font.exportData to use an explicit white-list of exportable properties, and stop exporting internal/unused properties #11773

Merged
merged 4 commits into from
Apr 5, 2020

Conversation

Snuffleupagus
Copy link
Collaborator

Please refer to the individual commit messages for addition details.


Note that there's still a number of potentially very large data-structures, containing e.g. Arrays and Objects, being exported despite being completely unused on the main-thread and/or in the API.
However, since we cannot be sure that they're completely unused in all custom implementations of the PDF.js library, we probably cannot just remove them without also providing users with a way to still get this information (that we've been accidentally exposing for years).

To not unnecessarily delay the changes in this PR, which on their own should already be beneficial in my opinion, I thus decided to post-phone any further clean-up to a separate PR.

@pdfjsbot
Copy link

pdfjsbot commented Apr 2, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/b1e2e89e6bdb950/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 2, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/da4e7af1c522d7c/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 2, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/da4e7af1c522d7c/output.txt

Total script time: 1.81 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/da4e7af1c522d7c/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 2, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/b1e2e89e6bdb950/output.txt

Total script time: 60.00 mins

@timvandermeij
Copy link
Contributor

/botio-linux test

@pdfjsbot
Copy link

pdfjsbot commented Apr 2, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d3f94fac19e00f7/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 2, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/d3f94fac19e00f7/output.txt

Total script time: 19.83 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/d3f94fac19e00f7/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.215.176.217:8877/4184d4e3b11ee2c/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/99a88402f616602/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/99a88402f616602/output.txt

Total script time: 19.73 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/99a88402f616602/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/4184d4e3b11ee2c/output.txt

Total script time: 60.00 mins

@timvandermeij
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c2e654819f62b2f/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/c2e654819f62b2f/output.txt

Total script time: 1.78 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/c2e654819f62b2f/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/f234c91e9f8c29a/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/f234c91e9f8c29a/output.txt

Total script time: 1.78 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/f234c91e9f8c29a/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

The bot should be fixed again.

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/a9dca1a66aaf03e/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/a9dca1a66aaf03e/output.txt

Total script time: 60.00 mins

@timvandermeij
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/7bc4d556021e372/output.txt

@Snuffleupagus
Copy link
Collaborator Author

It seems that the Windows bot have decided to take an extended vacation :-)

Anyway, given that the tests pass on the Linux bot (modulo the Chrome-intermittent ones) and that all tests obviously also pass locally on Windows for me; do we really need to wait for the Windows bot to be re-started again?
If this patch was fundamentally broken in some way, I'd have expected lots of errors everywhere...

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/7bc4d556021e372/output.txt

Total script time: 60.00 mins

@timvandermeij
Copy link
Contributor

Looks good after the comment above is addressed.

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/95a6c08f07db94f/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/8c6de4f0e934b5b/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/8c6de4f0e934b5b/output.txt

Total script time: 1.76 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/8c6de4f0e934b5b/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/95a6c08f07db94f/output.txt

Total script time: 19.89 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/95a6c08f07db94f/reftest-analyzer.html#web=eq.log

…properties

This patch addresses an existing, and very long standing, TODO in the code such that it's no longer possible to send arbitrary/unnecessary font properties to the main-thread.
Furthermore, by having a white-list it's also very easy to see *exactly* which font properties are being exported.

Please note that in its current form, the list of exported properties contains *every* possible enumerable property that may exist in a `Font` instance.
In practice no single font will contain *all* of these properties, and e.g. embedded/non-embedded/Type3 fonts will all differ slightly with respect to what properties are being defined. Hence why only explicitly set properties are included in the exported data, to avoid half of them being `undefined`, which however should not be a problem for any existing consumer (since they'd already need to handle those cases).

Since a fair number of these font properties are completely *internal* functionality, and doesn't make any sense to expose on the main-thread and/or in the API, follow-up patch(es) will be required to trim down the list. (I purposely included all properties here for brevity and future documentation purposes.)
…properties

A number of *internal* font properties, which only make sense on the worker-thread, were previously exported. Some of these properties could also contain potentially large Arrays/Objects, which thus unnecessarily increases memory usage since we're forced to copy these to the main-thread and also store them there.

This patch stops exporting the following font properties:

 - "_shadowWidth": An internal property, which was never intended to be exported.

 - "charsCache": An internal cache, which was never intended to be exported and doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually `undefined` or a mostly empty Object as well.

 - "cidEncoding": An internal property used with (some) composite fonts.
   As can be seen in the `PartialEvaluator.translateFont` method, `cidEncoding` will only be assigned a value when the font dictionary has an "Encoding" entry which is a `Name` (and not in the `Stream` case, since those obviously cannot be cloned).
   All-in-all this property doesn't really make sense on the main-thread and/or in the API, and note also that the resulting `cMap` property is (partially) available already.

 - "fallbackToUnicode": An internal map, part of the heuristics used to improve text-selection in (some) badly generated PDF documents with simple fonts. This was never intended to be exposed on the main-thread and/or in the API.

 - "glyphCache": An internal cache, which was never intended to be exported and which doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually a mostly empty Object as well.

 - "isOpenType": An internal property, used only during font parsing on the worker-thread. In the *very* unlikely event that an API consumer actually needs that information, then `fontType` should be a (generally) much better property to use.

Finally, in the (hopefully) unlikely event that any of these properties become necessary on the main-thread, re-adding them to the white-list is easy to do.
With `Font.exportData` now only exporting white-listed properties, there should no longer be any reason to not use standard shadowing in the `Font.spaceWidth` method.
Furthermore, considering the amount of other changes to the code-base over the years it's not even clear to me that the special-case was necessary any more (regardless of the preceding patches).
…a boolean

Given that the `vertical` property is always accessed on the main-thread, ensuring that the property is explicitly defined seems like the correct thing to do since it also avoids boolean casting elsewhere in the code-base.
@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/8ec0cb75970abc2/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/4ffa077a3967c3d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/4ffa077a3967c3d/output.txt

Total script time: 1.75 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/4ffa077a3967c3d/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 5, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/8ec0cb75970abc2/output.txt

Total script time: 19.64 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/8ec0cb75970abc2/reftest-analyzer.html#web=eq.log

@timvandermeij timvandermeij merged commit 0400109 into mozilla:master Apr 5, 2020
@timvandermeij
Copy link
Contributor

Nice improvements!

@Snuffleupagus Snuffleupagus deleted the Font-exportData-1 branch April 5, 2020 18:53
@Snuffleupagus
Copy link
Collaborator Author

Nice improvements!

Thanks for the review; the real improvements are PR #11777 though :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants