Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let Lexer.getNumber treat a single minus sign as zero (bug 1753983) #14543

Merged
merged 1 commit into from
Feb 9, 2022

Conversation

Snuffleupagus
Copy link
Collaborator

This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR #9827.

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/cdf6e45c9be24af/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/97cc794533cc10e/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/cdf6e45c9be24af/output.txt

Total script time: 23.33 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 13
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/cdf6e45c9be24af/reftest-analyzer.html#web=eq.log

This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.
@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/97cc794533cc10e/output.txt

Total script time: 60.00 mins

@Snuffleupagus
Copy link
Collaborator Author

/botio browsertest

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)


Received

Command cmd_browsertest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/acdddccfd0b8084/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)


Received

Command cmd_browsertest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/82ea52438fe7778/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/82ea52438fe7778/output.txt

Total script time: 19.34 mins

  • Regression tests: FAILED
  different ref/snapshot: 1

Image differences available at: http://54.193.163.58:8877/82ea52438fe7778/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/acdddccfd0b8084/output.txt

Total script time: 19.69 mins

  • Regression tests: FAILED
  different ref/snapshot: 9
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/acdddccfd0b8084/reftest-analyzer.html#web=eq.log

// This is consistent with Adobe Reader (fixes issue9252.pdf).
warn("Lexer.getNumber - treating a single decimal point as zero.");
return 0;
if (divideBy === 10 && sign === 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudn't we return 0 whatever we have ?

In pdfium:
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxcrt/fx_number.cpp#26
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxcrt/fx_string.cpp#46
if understand correctly, it always returns 0 in this case.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudn't we return 0 whatever we have ?

I'm not really sure if we want to change this "speculatively", since these code-paths only matter for corrupt documents and different PDF libraries probably use different fallback code-paths in these cases.
Especially for broken path-operators it's really quite easy to get somewhat "chaotic" rendering artefacts in corrupt documents, and I'd prefer not having to fix additional regressions here (since the original patch broke existing reference tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the original patch. Did it return 0 and break reftests? It does seem like returning 0 always seems like safe bet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the original patch. Did it return 0 and break reftests?

The changes were in https://github.com/mozilla/pdf.js/compare/36e9b7447a2a1f1062613537a2559d66d2510456..64f3dbeb489c2fd08c6918ed5c6d73d1a5788805, since otherwise existing test-cases broke.
Given that the affected EvaluatorPreprocessor-functionality contains a bunch of heuristics, I thus worry about additional breakage (with "chaotic" rendering) if we unconditionally start returning 0 from Lexer.getNumber for invalid input.

@@ -4561,7 +4561,7 @@ class EvaluatorPreprocessor {
}

static get MAX_INVALID_PATH_OPS() {
return shadow(this, "MAX_INVALID_PATH_OPS", 20);
return shadow(this, "MAX_INVALID_PATH_OPS", 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit how this relates to the lexer change below? I'm not really sure what this does.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version of the patch didn't include this, but that unfortunately caused rendering regressions; please note the bug1130815-eq-page1 regressions in #14543 (comment) above.
What happens is basically that the Lexer.getNumber changes are causing us to returning 0 in more cases than before, instead of immediately throwing, which means that rather than aborting parsing immediately we'll now continue until the following condition is satisfied:

pdf.js/src/core/evaluator.js

Lines 4643 to 4654 in d57f3a1

// Incomplete path operators, in particular, can result in fairly
// chaotic rendering artifacts. Hence the following heuristics is
// used to error, rather than just warn, once a number of invalid
// path operators have been encountered (fixes bug1443140.pdf).
if (
fn >= OPS.moveTo &&
fn <= OPS.endPath && // Path operator
++this._numInvalidPathOPS >
EvaluatorPreprocessor.MAX_INVALID_PATH_OPS
) {
throw new FormatError(`Invalid ${partialMsg}`);
}

In order to work-around those problems, the only simple solution (that I found) was to somewhat arbitrarily reduce this constant to now accept fewer corrupt path-operators than before. Obviously all of this is very hand-wavy, since it's not really clear how you're supposed to (in general) deal with this type of PDF corruption; hence why I'd very much prefer not implementing #14543 (comment) as well here since that'd mean accepting "new" corrupt path-operators which could easily cause additional rendering regressions elsewhere.
We should probably try to somehow re-factor and improve all of that in the EvaluatorPreprocessor, but I'd prefer not doing that here.

@brendandahl
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2022

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://54.241.84.105:8877/2a73b051635e735/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2022

From: Bot.io (Windows)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://54.193.163.58:8877/aef3fcad494ad4d/output.txt

@brendandahl brendandahl merged commit f8b2a99 into mozilla:master Feb 9, 2022
@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2022

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/aef3fcad494ad4d/output.txt

Total script time: 20.26 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2022

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/2a73b051635e735/output.txt

Total script time: 21.17 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants