Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Add the possibility to collect Javascript actions #12429

Merged
merged 1 commit into from
Oct 14, 2020

Conversation

calixteman
Copy link
Contributor

No description provided.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a very cursory look, I've added a few quick comments.

src/shared/util.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/display/api.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
@timvandermeij
Copy link
Contributor

I'm missing some context for this patch. Which issue does this solve? Why are we interested in collecting this information, and what is it used for?

@calixteman
Copy link
Contributor Author

I'm missing some context for this patch. Which issue does this solve? Why are we interested in collecting this information, and what is it used for?

I'll fix the nits tomorrow: thanks for that.
We plan to add support to exectute js stuff embedded in the PDF.
A first step has been made here: https://phabricator.services.mozilla.com/D91746 for the m-c version.
And we plan to add something else (probably quickjs in using emscripten) for the extension version.
Anyway if there is an open issue for js stuff, then please assign me.

@calixteman calixteman force-pushed the collect_js branch 2 times, most recently from f2286d8 to 6045c63 Compare October 1, 2020 08:22
src/core/worker.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but the API implemented here is really not correct!

Similar to the regular getAnnotations method, which lives on the page rather than the document, this new API-method must also be on the page-level.

Please keep in mind that the PDF.js library is written such that data can be loaded when needed, using e.g. range requests and streaming, to explicitly not require that the entire document is loaded in order for rendering to start (see e.g. the disableAutoFetch option).

Hence we really can't/shouldn't add an API-method that essentially forces the entire document to load and all pages to be parsed, which would be particularly bad when invoked from the viewer. (It would force a lot of data to be requested very early on, possibly affecting general viewer performance negatively when loading longer documents.)

@calixteman
Copy link
Contributor Author

I thought about that of course and the problem is when a script is getting a value from a Field which hasn't been rendered: we can't await for its value.
Setting a value shouldn't be a problem: we can put it in annotationStorage if the Field hasn't been rendered and use this value when rendered for this first time.

What we can do:

  • only get all the annotations and then run the sandbox when at least one contains some JS
  • make the hypothesis that no script is trying to get a value from a non-rendered field (I guess it should be pretty rare but tbh I don't know) and then use a by-page approach.
  • or another good idea I didn't think about: any suggestion will be appreciated.

@calixteman
Copy link
Contributor Author

I moved the data collection stuff in PDFDocument in using Fields entry from Acroform dictionary.
So this way, we don't parse all the pages.
@Snuffleupagus does it sound better ?

@Snuffleupagus
Copy link
Collaborator

I moved the data collection stuff in PDFDocument in using Fields entry from Acroform dictionary.
So this way, we don't parse all the pages.
@Snuffleupagus does it sound better ?

I suppose I still don't really understand why this, similar to annotations in general, cannot be parsed on a page-by-page basis!?
Are you perhaps expecting that JavaScript actions on one page will affect fields on other pages?

(Generally speaking, given the lack of context/information in the commit message it's quite difficult to get a good overview of the entire situation and e.g. understand why you went with a particular approach.)

@calixteman
Copy link
Contributor Author

I moved the data collection stuff in PDFDocument in using Fields entry from Acroform dictionary.
So this way, we don't parse all the pages.
@Snuffleupagus does it sound better ?

I suppose I still don't really understand why this, similar to annotations in general, cannot be parsed on a page-by-page basis!?
Are you perhaps expecting that JavaScript actions on one page will affect fields on other pages?

Yes exactly

(Generally speaking, given the lack of context/information in the commit message it's quite difficult to get a good overview of the entire situation and e.g. understand why you went with a particular approach.)

My bad, you're right, sorry about that.
Anyway I gave more explanations in the different comments I made (e.g. #12429 (comment)).

@calixteman
Copy link
Contributor Author

Can we move forward on this patch please ?

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that it's finally clear to me why the extra method is needed, I think this approach can be used now with these comments addressed, in particular to add some documentation on why they are there (see the comment below).

src/core/annotation.js Outdated Show resolved Hide resolved
src/core/annotation.js Show resolved Hide resolved
src/core/annotation.js Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/shared/compatibility.js Outdated Show resolved Hide resolved
src/shared/util.js Outdated Show resolved Hide resolved
test/unit/annotation_spec.js Outdated Show resolved Hide resolved
test/unit/annotation_spec.js Show resolved Hide resolved
src/core/worker.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
@calixteman calixteman force-pushed the collect_js branch 2 times, most recently from dea74f0 to 932f441 Compare October 13, 2020 13:59
@timvandermeij
Copy link
Contributor

/botio unittest

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/73449564b1e981f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/04c5dc96f5a5eb5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/73449564b1e981f/output.txt

Total script time: 4.26 mins

  • Unit Tests: FAILED

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/04c5dc96f5a5eb5/output.txt

Total script time: 5.22 mins

  • Unit Tests: Passed

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending these final comments now that I did a full review, and passing tests afterwards (ignore the intermittent one above). Thanks!

src/core/annotation.js Show resolved Hide resolved
src/core/annotation.js Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/annotation.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/core/document.js Outdated Show resolved Hide resolved
src/shared/util.js Show resolved Hide resolved
test/unit/annotation_spec.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me too, with just one more question; please see #12429 (comment)

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c25b840b1a5ae14/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/c2770e8bee80edb/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/c2770e8bee80edb/output.txt

Total script time: 25.80 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/c2770e8bee80edb/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/c25b840b1a5ae14/output.txt

Total script time: 31.36 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/c25b840b1a5ae14/reftest-analyzer.html#web=eq.log

@timvandermeij timvandermeij changed the title Add the possibility to collect Javascript actions [api-minor] Add the possibility to collect Javascript actions Oct 14, 2020
@timvandermeij timvandermeij merged commit a373137 into mozilla:master Oct 14, 2020
@timvandermeij
Copy link
Contributor

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants