-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New XML parser #9573
New XML parser #9573
Conversation
9fccb49
to
ec77328
Compare
case 'amp': | ||
return '&'; | ||
case 'quot': | ||
return '\"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code also handled apos
, see https://github.com/mozilla/pdf.js/pull/9573/files#diff-3cb21ac6c8ce15118704e14df3dcc9b2L256; is that no longer necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's HTML thing, but we can add it there too
ec77328
to
06452bf
Compare
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/4a8eb8977cfd151/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/0638ec9c7e0b33e/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/4a8eb8977cfd151/output.txt Total script time: 18.17 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/0638ec9c7e0b33e/output.txt Total script time: 24.16 mins
|
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/80445c7126554ab/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/80445c7126554ab/output.txt Total script time: 2.70 mins Published |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good to me (since it already worked in Shumway, I glanced over the details of the XML parsing), but there are some final comments I'd like to see addressed before approving this. Thanks!
src/display/xml_parser.js
Outdated
|
||
function isWhitespaceString(s) { | ||
for (let i = 0; i < s.length; i++) { | ||
const ch = s[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just call isWhitespace(s[i])
here so we don't have to copy the logic. Also, let's cache s.length
using for (let i = 0, ii = s.length; i < ii; i++)
like we do elsewhere in the codebase.
src/display/metadata.js
Outdated
@@ -23,13 +23,15 @@ class Metadata { | |||
// Ghostscript may produce invalid metadata, so try to repair that first. | |||
data = this._repair(data); | |||
|
|||
// Convert the string to a DOM `Document`. | |||
// Convert the string to a XML document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: an XML document
src/display/xml_parser.js
Outdated
name = s.substring(start, pos); | ||
skipWs(); | ||
while (pos < s.length && s[pos] !== '>' && | ||
s[pos] !== '/' && s[pos] !== '?') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should to be aligned with pos
for readability.
src/display/xml_parser.js
Outdated
const doctypeContent = | ||
s.substring(j + 8, q + (complexDoctype ? 1 : 0)); | ||
this.onDoctype(doctypeContent); | ||
// XXX pull entities ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? Can we remove it?
src/display/xml_parser.js
Outdated
} else { | ||
do { | ||
/* skipping text items */ | ||
} while (j++ < s.length && s[j] !== '<'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a do .. while
loop here instead of a regular while
loop as used above?
src/display/xml_parser.js
Outdated
} | ||
} | ||
|
||
export class SimpleXMLParser extends XMLParserBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we put the exports at the bottom of the file. Can we do that here too for consistency (compare https://github.com/mozilla/pdf.js/pull/9573/files#diff-3cb21ac6c8ce15118704e14df3dcc9b2L405)?
super(); | ||
this._currentFragment = null; | ||
this._stack = null; | ||
this._errorCode = XMLParserErrorCode.NoError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseFromString
sets this too, so should we just set it too null
here too like the other two members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't want to change shape of the object by keep _errorCode a number. I can add XMLParserErrorCode.Unknown value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, let's keep it like how it is now for simplicity. Thanks!
06452bf
to
655c8d3
Compare
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/bd3c487b19dd6fb/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 1 Live output at: http://54.215.176.217:8877/81d163b152949eb/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/bd3c487b19dd6fb/output.txt Total script time: 18.20 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/81d163b152949eb/output.txt Total script time: 24.27 mins
|
Why is a custom XML parser used instead of the standard |
Please see issue #8903; and also https://bugzilla.mozilla.org/show_bug.cgi?id=1386676. |
Improves performance of the RegExp based SimpleXMLParser.
The code for XMLParserBase copied from
https://github.com/mozilla/shumway/blob/16451d8836fa85f4b16eeda8b4bda2fa9e2b22b0/src/avm2/natives/xml.ts