-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Replace DOMParser
with SimpleXMLParser
#8912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice simplification!
One question though: Would it maybe make more sense to place the SimpleXMLParser
(and related code) in a utility file instead (such as /src/display/dom_utils.js
)?
src/display/metadata.js
Outdated
do { | ||
lastLength = nodes.length; | ||
data = data.replace( | ||
/<([\w\:]+)((?:[\s\w:=]|'[^']*'|"[^"]*")*)(?:\/>|>([\d,]*)<\/[^>]+>)/g, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating a regular expression over and over in a loop, would it make sense to just define it once instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit, together with the code move to the utility file. Good points, thank you!
9d19e81
to
942b962
Compare
src/display/metadata.js
Outdated
|
||
// Convert the string to a DOM `Document`. | ||
let parser = new SimpleXMLParser(); | ||
data = parser.parseFromString(data, 'application/xml'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It appears that the parseFromString
method (of the SimpleXMLParser
) only takes one argument, so presumably we no longer need to pass in 'application/xml'
unless I'm missing something here!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! You're right, since it's now specialized to an XML parser.
942b962
to
178b2d5
Compare
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/65ed2537014a760/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/65ed2537014a760/output.txt Total script time: 2.32 mins Published |
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/f66b21324b55e0b/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/cd1e2a316abe5db/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/cd1e2a316abe5db/output.txt Total script time: 16.77 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/f66b21324b55e0b/output.txt Total script time: 29.87 mins
|
src/display/dom_utils.js
Outdated
parseFromString(data) { | ||
let nodes = []; | ||
|
||
data = data.replace(/<\?[\s\S]*?\?>|<!--[\s\S]*?-->/g, '').trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment before: // Remove all comments and processing instructions
return typeof this.metadata[name] !== 'undefined'; | ||
}, | ||
}; | ||
getAll() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep 'metadata', just have deprecated() and getAll() calls inside the getter.
src/display/dom_utils.js
Outdated
let nodes = []; | ||
|
||
data = data.replace(/<\?[\s\S]*?\?>|<!--[\s\S]*?-->/g, '').trim(); | ||
data = data.replace(/>([^<][\s\S]*?)</g, (all, text) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment before: // Extract all text nodes and replace them with numeric index in the nodes.
src/display/dom_utils.js
Outdated
} | ||
return '>' + length + ',<'; | ||
}); | ||
data = data.replace(/<!\[CDATA\[([\s\S]*?)\]\]>/g, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Extract all CDATA nodes
src/display/dom_utils.js
Outdated
return length + ','; | ||
}); | ||
|
||
let regex = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment before: // Until such nodes without '<' and '>' content are present, replace these with numeric index in the nodes.
return length + ','; | ||
}); | ||
} while (lastLength < nodes.length); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We shall have only one root index left, which will be last in the nodes.
src/display/metadata.js
Outdated
let parser = new SimpleXMLParser(); | ||
data = parser.parseFromString(data); | ||
} else if (!(data instanceof Document)) { | ||
throw new Error('Metadata: input is not a string or `Document`'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need instanceof Document
check? if not let's replace if (typeof data === 'string'
with assert.
DOMParser
with SimpleXMLParser
DOMParser
with SimpleXMLParser
178b2d5
to
71f87d4
Compare
The `DOMParser` is most likely overkill and may be less secure. Moreover, it is not supported in Node.js environments. This patch replaces the `DOMParser` with a simple XML parser. This should be faster and gives us Node.js support for free. The simple XML parser is a port of the one that existed in the examples folder with a small regex fix to make the parsing work correctly. The unit tests are extended for increased test coverage of the metadata code. The new method `getAll` is provided so the example does not have to access internal properties of the object anymore.
71f87d4
to
2281061
Compare
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @yurydelendik received. Current queue size: 0 Live output at: http://54.67.70.0:8877/20f73f6b7062bd2/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/20f73f6b7062bd2/output.txt Total script time: 2.29 mins Published |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch.
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/2a20f49870c32f1/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 1 Live output at: http://54.215.176.217:8877/5551a66cea49d99/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/2a20f49870c32f1/output.txt Total script time: 16.62 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/5551a66cea49d99/output.txt Total script time: 29.62 mins
|
[api-minor] Replace `DOMParser` with `SimpleXMLParser`
The
DOMParser
is most likely overkill and may be less secure. Moreover, it is not supported in Node.js environments.This patch replaces the
DOMParser
with a simple XML parser. This should be faster and gives us Node.js support for free. The simple XML parser is a port of the one that existed in the examples folder with improved regexes to make the parsing work properly.The unit tests are extended for increased test coverage of the metadata code. The new method
getAll
is provided so the example does not have to access internal properties of the object anymore.Fixes #8903.