-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let Lexer.getNumber
treat a single minus sign as zero (bug 1753983)
#14543
Conversation
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/cdf6e45c9be24af/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/97cc794533cc10e/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/cdf6e45c9be24af/output.txt Total script time: 23.33 mins
Image differences available at: http://54.241.84.105:8877/cdf6e45c9be24af/reftest-analyzer.html#web=eq.log |
This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.
36e9b74
to
64f3dbe
Compare
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/97cc794533cc10e/output.txt Total script time: 60.00 mins |
/botio browsertest |
From: Bot.io (Linux m4)ReceivedCommand cmd_browsertest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/acdddccfd0b8084/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_browsertest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/82ea52438fe7778/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/82ea52438fe7778/output.txt Total script time: 19.34 mins
Image differences available at: http://54.193.163.58:8877/82ea52438fe7778/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/acdddccfd0b8084/output.txt Total script time: 19.69 mins
Image differences available at: http://54.241.84.105:8877/acdddccfd0b8084/reftest-analyzer.html#web=eq.log |
// This is consistent with Adobe Reader (fixes issue9252.pdf). | ||
warn("Lexer.getNumber - treating a single decimal point as zero."); | ||
return 0; | ||
if (divideBy === 10 && sign === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoudn't we return 0
whatever we have ?
In pdfium:
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxcrt/fx_number.cpp#26
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxcrt/fx_string.cpp#46
if understand correctly, it always returns 0 in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoudn't we return
0
whatever we have ?
I'm not really sure if we want to change this "speculatively", since these code-paths only matter for corrupt documents and different PDF libraries probably use different fallback code-paths in these cases.
Especially for broken path-operators it's really quite easy to get somewhat "chaotic" rendering artefacts in corrupt documents, and I'd prefer not having to fix additional regressions here (since the original patch broke existing reference tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the original patch. Did it return 0
and break reftests? It does seem like returning 0 always seems like safe bet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the original patch. Did it return
0
and break reftests?
The changes were in https://github.com/mozilla/pdf.js/compare/36e9b7447a2a1f1062613537a2559d66d2510456..64f3dbeb489c2fd08c6918ed5c6d73d1a5788805, since otherwise existing test-cases broke.
Given that the affected EvaluatorPreprocessor
-functionality contains a bunch of heuristics, I thus worry about additional breakage (with "chaotic" rendering) if we unconditionally start returning 0
from Lexer.getNumber
for invalid input.
@@ -4561,7 +4561,7 @@ class EvaluatorPreprocessor { | |||
} | |||
|
|||
static get MAX_INVALID_PATH_OPS() { | |||
return shadow(this, "MAX_INVALID_PATH_OPS", 20); | |||
return shadow(this, "MAX_INVALID_PATH_OPS", 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a bit how this relates to the lexer change below? I'm not really sure what this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first version of the patch didn't include this, but that unfortunately caused rendering regressions; please note the bug1130815-eq-page1
regressions in #14543 (comment) above.
What happens is basically that the Lexer.getNumber
changes are causing us to returning 0
in more cases than before, instead of immediately throwing, which means that rather than aborting parsing immediately we'll now continue until the following condition is satisfied:
Lines 4643 to 4654 in d57f3a1
// Incomplete path operators, in particular, can result in fairly | |
// chaotic rendering artifacts. Hence the following heuristics is | |
// used to error, rather than just warn, once a number of invalid | |
// path operators have been encountered (fixes bug1443140.pdf). | |
if ( | |
fn >= OPS.moveTo && | |
fn <= OPS.endPath && // Path operator | |
++this._numInvalidPathOPS > | |
EvaluatorPreprocessor.MAX_INVALID_PATH_OPS | |
) { | |
throw new FormatError(`Invalid ${partialMsg}`); | |
} |
In order to work-around those problems, the only simple solution (that I found) was to somewhat arbitrarily reduce this constant to now accept fewer corrupt path-operators than before. Obviously all of this is very hand-wavy, since it's not really clear how you're supposed to (in general) deal with this type of PDF corruption; hence why I'd very much prefer not implementing #14543 (comment) as well here since that'd mean accepting "new" corrupt path-operators which could easily cause additional rendering regressions elsewhere.
We should probably try to somehow re-factor and improve all of that in the EvaluatorPreprocessor
, but I'd prefer not doing that here.
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://54.241.84.105:8877/2a73b051635e735/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://54.193.163.58:8877/aef3fcad494ad4d/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/aef3fcad494ad4d/output.txt Total script time: 20.26 mins
|
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/2a73b051635e735/output.txt Total script time: 21.17 mins
|
This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR #9827.