-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for detecting invalid XML that has unsupported content before root element #184
Add support for detecting invalid XML that has unsupported content before root element #184
Conversation
@@ -62,6 +43,28 @@ def test_xml_declaration_not_at_document_start | |||
end | |||
end | |||
|
|||
def test_garbage_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you improve test name? This is not a garbage text test now.
test_comment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I see.
doc = parse(<<~XML) | ||
<?x y | ||
<!--?><?x -->?> | ||
<r/> | ||
XML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to use <<~
, could you indent content?
doc = parse(<<~XML) | |
<?x y | |
<!--?><?x -->?> | |
<r/> | |
XML | |
doc = parse(<<~XML) | |
<?x y | |
<!--?><?x -->?> | |
<r/> | |
XML |
BTW, I prefer <<-
to <<~
for XML content because it's not a program code.
I think that indentation is needless for not a program code in program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I see.
assert_equal(["x", "y\n<!--"], [ doc.children[0].target, doc.children[0].content]) | ||
assert_equal(["x", "-->"], [ doc.children[1].target, doc.children[1].content]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you unify these assertions to one assert for easy to debug?
assert_equal(["x", "y\n<!--"], [ doc.children[0].target, doc.children[0].content]) | |
assert_equal(["x", "-->"], [ doc.children[1].target, doc.children[1].content]) | |
assert_equal([["x", "y\n<!--"], | |
["x", "-->"]], | |
[[doc.children[0].target, doc.children[0].content], | |
[doc.children[1].target, doc.children[1].content]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I see.
It looks good.
1722700
to
0f7f84d
Compare
## Why? XML with content at the start of the document is invalid. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc ``` [27] Misc ::= Comment | PI | S ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI ``` [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget ``` [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl ``` [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' ```
0f7f84d
to
5ebd293
Compare
Thanks. |
Why?
XML with content at the start of the document is invalid.
https://www.w3.org/TR/2006/REC-xml11-20060816/#document
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl
See: #164 (comment)