-
Notifications
You must be signed in to change notification settings - Fork 27.4k
fix: text that looks like an html tag but is not causes [$sanitize:badparse] error #8193
Conversation
Thanks for the PR! Please check the items below to help us merge this faster. See the contributing docs for more information.
If you need to make changes to your pull request, you can update the commit with Thanks again for your help! |
Hi, thanks for re-opening this. Could you also add another unit test to show how this will work when this happens in text within a tag? For instance, Another case like |
I added the first test case but I cannot add the second one because of the way the tests are written. The internal |
You can simply run |
Thanks. Test added. |
@@ -81,6 +82,16 @@ describe('HTML', function() { | |||
expect(text).toEqual('text'); | |||
}); | |||
|
|||
it('should parse unterminated tags as regular content', function() { | |||
htmlParser('<a text1 text2 <a text1 text2', handler); | |||
expect(text).toEqual('<a text1 text2 <a text1 text2'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong because it breaks html parsing rules. browsers either ignore this and throw it away or try to autocorrect the html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is correct --- I guess we need a non-alpha character to precede the < (in testing, <ê, <3, <- are all fine). I don't think we can come even close to being as crazy as the actual parsing rules though, that would be way too much code.
- A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)
- Advance the position pointer so that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII >) byte.
- Repeatedly get an attribute until no further attributes can be found, then jump to the step below labeled next byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should sanitize convert the <
to <
in text that looks like unterminated tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (for example, http://jsfiddle.net/fMDyg/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take it from here and do the necessary change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can have a go at it :>
cait, can you own this? we should be careful about letting unescaped |
will do |
@IgorMinar we have tests in the tree which verify that we correctly parse invalid HTML ( angular.js/test/ngSanitize/sanitizeSpec.js Line 127 in 627b035
Since I'm fixing up this CL, should I change that too? Or would that be too much of a breaking change. (testing on jsfiddle, even ie8 behaves correctly here) |
… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes angular#8193
… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes #8212 Closes #8193
@caitp Great news! thanks. Would you please backport the fix to v1.2 as this bug greatly affects us.Thanks. |
@sylvain-hamel good news, it was backported 36d2658 |
@caitp Great. I'll test the fix tomorrow. |
@sylvain-hamel actually I broke our ci-checks task porting that into v1.2.x, I think I need to revert that and sort that out, but if you try with that sha it should work anyways |
… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes angular#8212 Closes angular#8193
Hi @caitp, I tired your fix (from 36d2658) and it does not work as I expected. The following markup still causes the same failure (
My expectation is that Me:
You:
Note that the fix I had submitted did work, however it left the |
@sylvain-hamel this is how the HTML parsing rules work: if you have a $sanitize was changed (in that patch) to match this behaviour --- obviously it's a bit more complicated in a real HTML parser, but it's really the best we can do without a huge file size. So basically, that is working exactly how it should, and you should encode the |
that's fine if the thing being parsed is a element. But here it is not, IMO sanitize should detect that and not treat it as HTML and instead just make it safe by encoding it. What would be a good reason not to add this feature to sanitize? |
Because we have no concept of insertion modes in the $sanitize parser, it would be too complex to support that. Basically just escape your |
In my case this is user provided content. If the user entered some valid html, I want the rendered result. But if he just entered some text that looks like html then I need it to be escaped and rendered as-is.
In order for me to only encode the text that looks like html, I basically need to implement the feature I'm asking sanitize to support. Do you agree that no input should ever cause |
what exactly do you get out of binding the html |
given a view like this:
if the user enters this in the textarea:
I want the output to be this: b is smaller than a. I can even say: b<a and that's shorter. But with |
@caitp Does my use case make sense to you? |
No, I think this is something that we shouldn't support in ngSanitize. There is no shortage of ways around it, though. As far as I'm concerned, ngSanitize should honour |
You could write a version of ngBindHtml which doesn't throw, or just creates a text node with the contents if it does throw, but I don't think ngBindHtml should support non-html |
The provided unit test fails with this error.
The provided change fixes it.
Can you please backport the fix into v1.2.x as this is a show stopper for us. Thanks