Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) #14543

Snuffleupagus · 2022-02-07T15:23:17Z

This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR #9827.

Snuffleupagus · 2022-02-07T15:27:14Z

/botio test

pdfjsbot · 2022-02-07T15:27:15Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/cdf6e45c9be24af/output.txt

pdfjsbot · 2022-02-07T15:27:15Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/97cc794533cc10e/output.txt

pdfjsbot · 2022-02-07T15:50:36Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.241.84.105:8877/cdf6e45c9be24af/output.txt

Total script time: 23.33 mins

Font tests: Passed
Unit tests: Passed
Integration Tests: Passed
Regression tests: FAILED

  different ref/snapshot: 13
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/cdf6e45c9be24af/reftest-analyzer.html#web=eq.log

This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.

pdfjsbot · 2022-02-07T16:27:33Z

From: Bot.io (Windows)

Failed

Full output at http://54.193.163.58:8877/97cc794533cc10e/output.txt

Total script time: 60.00 mins

Snuffleupagus · 2022-02-07T16:30:54Z

/botio browsertest

pdfjsbot · 2022-02-07T16:30:55Z

From: Bot.io (Linux m4)

Received

Command cmd_browsertest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/acdddccfd0b8084/output.txt

pdfjsbot · 2022-02-07T16:30:56Z

From: Bot.io (Windows)

Received

Command cmd_browsertest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/82ea52438fe7778/output.txt

pdfjsbot · 2022-02-07T16:50:17Z

From: Bot.io (Windows)

Failed

Full output at http://54.193.163.58:8877/82ea52438fe7778/output.txt

Total script time: 19.34 mins

Regression tests: FAILED

  different ref/snapshot: 1

Image differences available at: http://54.193.163.58:8877/82ea52438fe7778/reftest-analyzer.html#web=eq.log

pdfjsbot · 2022-02-07T16:50:38Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.241.84.105:8877/acdddccfd0b8084/output.txt

Total script time: 19.69 mins

Regression tests: FAILED

  different ref/snapshot: 9
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/acdddccfd0b8084/reftest-analyzer.html#web=eq.log

calixteman · 2022-02-07T18:10:17Z

src/core/parser.js

        // This is consistent with Adobe Reader (fixes issue9252.pdf).
-        warn("Lexer.getNumber - treating a single decimal point as zero.");
-        return 0;
+        if (divideBy === 10 && sign === 0) {


Shoudn't we return 0 whatever we have ?

In pdfium:
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxcrt/fx_number.cpp#26
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxcrt/fx_string.cpp#46
if understand correctly, it always returns 0 in this case.

Shoudn't we return 0 whatever we have ?

I'm not really sure if we want to change this "speculatively", since these code-paths only matter for corrupt documents and different PDF libraries probably use different fallback code-paths in these cases.
Especially for broken path-operators it's really quite easy to get somewhat "chaotic" rendering artefacts in corrupt documents, and I'd prefer not having to fix additional regressions here (since the original patch broke existing reference tests).

I didn't see the original patch. Did it return 0 and break reftests? It does seem like returning 0 always seems like safe bet.

I didn't see the original patch. Did it return 0 and break reftests?

The changes were in https://github.com/mozilla/pdf.js/compare/36e9b7447a2a1f1062613537a2559d66d2510456..64f3dbeb489c2fd08c6918ed5c6d73d1a5788805, since otherwise existing test-cases broke.
Given that the affected EvaluatorPreprocessor-functionality contains a bunch of heuristics, I thus worry about additional breakage (with "chaotic" rendering) if we unconditionally start returning 0 from Lexer.getNumber for invalid input.

timvandermeij · 2022-02-09T18:48:58Z

src/core/evaluator.js

@@ -4561,7 +4561,7 @@ class EvaluatorPreprocessor {
  }

  static get MAX_INVALID_PATH_OPS() {
-    return shadow(this, "MAX_INVALID_PATH_OPS", 20);
+    return shadow(this, "MAX_INVALID_PATH_OPS", 10);


Could you explain a bit how this relates to the lexer change below? I'm not really sure what this does.

The first version of the patch didn't include this, but that unfortunately caused rendering regressions; please note the bug1130815-eq-page1 regressions in #14543 (comment) above.
What happens is basically that the Lexer.getNumber changes are causing us to returning 0 in more cases than before, instead of immediately throwing, which means that rather than aborting parsing immediately we'll now continue until the following condition is satisfied:

pdf.js/src/core/evaluator.js

Lines 4643 to 4654 in d57f3a1

// Incomplete path operators, in particular, can result in fairly

// chaotic rendering artifacts. Hence the following heuristics is

// used to error, rather than just warn, once a number of invalid

// path operators have been encountered (fixes bug1443140.pdf).

if (

fn >= OPS.moveTo &&

fn <= OPS.endPath && // Path operator

++this._numInvalidPathOPS >

EvaluatorPreprocessor.MAX_INVALID_PATH_OPS

) {

throw new FormatError(`Invalid ${partialMsg}`);

}

In order to work-around those problems, the only simple solution (that I found) was to somewhat arbitrarily reduce this constant to now accept fewer corrupt path-operators than before. Obviously all of this is very hand-wavy, since it's not really clear how you're supposed to (in general) deal with this type of PDF corruption; hence why I'd very much prefer not implementing #14543 (comment) as well here since that'd mean accepting "new" corrupt path-operators which could easily cause additional rendering regressions elsewhere.
We should probably try to somehow re-factor and improve all of that in the EvaluatorPreprocessor, but I'd prefer not doing that here.

brendandahl · 2022-02-09T22:06:30Z

/botio makeref

pdfjsbot · 2022-02-09T22:06:31Z

From: Bot.io (Linux m4)

Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://54.241.84.105:8877/2a73b051635e735/output.txt

pdfjsbot · 2022-02-09T22:06:31Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://54.193.163.58:8877/aef3fcad494ad4d/output.txt

pdfjsbot · 2022-02-09T22:26:47Z

From: Bot.io (Windows)

Success

Full output at http://54.193.163.58:8877/aef3fcad494ad4d/output.txt

Total script time: 20.26 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2022-02-09T22:27:42Z

From: Bot.io (Linux m4)

Success

Full output at http://54.241.84.105:8877/2a73b051635e735/output.txt

Total script time: 21.17 mins

Lint: Passed
Make references: Passed
Check references: Passed

Snuffleupagus added core corrupted-pdf labels Feb 7, 2022

Let Lexer.getNumber treat a single minus sign as zero (bug 1753983)

64f3dbe

This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.

Snuffleupagus force-pushed the bug-1753983 branch from 36e9b74 to 64f3dbe Compare February 7, 2022 16:10

calixteman reviewed Feb 7, 2022

View reviewed changes

timvandermeij reviewed Feb 9, 2022

View reviewed changes

brendandahl merged commit f8b2a99 into mozilla:master Feb 9, 2022

Snuffleupagus deleted the bug-1753983 branch February 10, 2022 11:36

Snuffleupagus mentioned this pull request Oct 20, 2022

Let Lexer.getNumber treat more invalid "numbers" as zero (issue 15604) #15606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) #14543

Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) #14543

Snuffleupagus commented Feb 7, 2022

Snuffleupagus commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

Snuffleupagus commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

calixteman Feb 7, 2022

Snuffleupagus Feb 7, 2022 •

edited

Loading

brendandahl Feb 8, 2022

Snuffleupagus Feb 9, 2022

timvandermeij Feb 9, 2022

Snuffleupagus Feb 9, 2022 •

edited

Loading

brendandahl commented Feb 9, 2022

pdfjsbot commented Feb 9, 2022

pdfjsbot commented Feb 9, 2022

pdfjsbot commented Feb 9, 2022

pdfjsbot commented Feb 9, 2022

	// Incomplete path operators, in particular, can result in fairly
	// chaotic rendering artifacts. Hence the following heuristics is
	// used to error, rather than just warn, once a number of invalid
	// path operators have been encountered (fixes bug1443140.pdf).
	if (
	fn >= OPS.moveTo &&
	fn <= OPS.endPath && // Path operator
	++this._numInvalidPathOPS >
	EvaluatorPreprocessor.MAX_INVALID_PATH_OPS
	) {
	throw new FormatError(`Invalid ${partialMsg}`);
	}

Let Lexer.getNumber treat a single minus sign as zero (bug 1753983) #14543

Let Lexer.getNumber treat a single minus sign as zero (bug 1753983) #14543

Conversation

Snuffleupagus commented Feb 7, 2022

Snuffleupagus commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)

Received

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)

Received

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)

Failed

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)

Failed

Snuffleupagus commented Feb 7, 2022

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)

Received

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)

Received

pdfjsbot commented Feb 7, 2022

From: Bot.io (Windows)

Failed

pdfjsbot commented Feb 7, 2022

From: Bot.io (Linux m4)

Failed

calixteman Feb 7, 2022

Choose a reason for hiding this comment

Snuffleupagus Feb 7, 2022 • edited Loading

Choose a reason for hiding this comment

brendandahl Feb 8, 2022

Choose a reason for hiding this comment

Snuffleupagus Feb 9, 2022

Choose a reason for hiding this comment

timvandermeij Feb 9, 2022

Choose a reason for hiding this comment

Snuffleupagus Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

brendandahl commented Feb 9, 2022

pdfjsbot commented Feb 9, 2022

From: Bot.io (Linux m4)

Received

pdfjsbot commented Feb 9, 2022

From: Bot.io (Windows)

Received

pdfjsbot commented Feb 9, 2022

From: Bot.io (Windows)

Success

pdfjsbot commented Feb 9, 2022

From: Bot.io (Linux m4)

Success

Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) #14543

Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) #14543

Snuffleupagus Feb 7, 2022 •

edited

Loading

Snuffleupagus Feb 9, 2022 •

edited

Loading