fix: text that looks like an html tag but is not causes [$sanitize:badparse] error #8193

sylvain-hamel · 2014-07-14T19:07:18Z

The provided unit test fails with this error.

Error: [$sanitize:badparse] The sanitizer was unable to parse the following block of html: <
nonEndingTag href <nonEndingTag href

The provided change fixes it.

Can you please backport the fix into v1.2.x as this is a show stopper for us. Thanks

mary-poppins · 2014-07-14T19:07:21Z

Thanks for the PR! Please check the items below to help us merge this faster. See the contributing docs for more information.

Uses the issue template (#8193)

If you need to make changes to your pull request, you can update the commit with git commit --amend.
Then, update the pull request with git push -f.

Thanks again for your help!

caitp · 2014-07-14T19:14:17Z

Hi, thanks for re-opening this. Could you also add another unit test to show how this will work when this happens in text within a tag?

For instance, 10 < 100 --- we should have a test to ensure that that works correctly.

Another case like 10 < 100 would also be good.

sylvain-hamel · 2014-07-14T20:22:57Z

I added the first test case but I cannot add the second one because of the way the tests are written. The internal handler used in the tests was written to handle only one tag; not nested tags.

caitp · 2014-07-14T20:31:53Z

You can simply run sanitize() and check if we get the expected value.

…dparse] error

sylvain-hamel · 2014-07-14T20:45:53Z

Thanks. Test added.

IgorMinar · 2014-07-15T15:17:17Z

test/ngSanitize/sanitizeSpec.js

@@ -81,6 +82,16 @@ describe('HTML', function() {
      expect(text).toEqual('text');
    });

+    it('should parse unterminated tags as regular content', function() {
+      htmlParser('<a text1 text2 <a text1 text2', handler);
+      expect(text).toEqual('<a text1 text2 <a text1 text2');


this is wrong because it breaks html parsing rules. browsers either ignore this and throw it away or try to autocorrect the html.

yes, this is correct --- I guess we need a non-alpha character to precede the < (in testing, <ê, <3, <- are all fine). I don't think we can come even close to being as crazy as the actual parsing rules though, that would be way too much code.

A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)

Advance the position pointer so that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII >) byte.

Repeatedly get an attribute until no further attributes can be found, then jump to the step below labeled next byte.

Should sanitize convert the < to < in text that looks like unterminated tags?

Yes (for example, http://jsfiddle.net/fMDyg/)

Can you take it from here and do the necessary change?

I can have a go at it :>

IgorMinar · 2014-07-15T15:48:51Z

cait, can you own this?

we should be careful about letting unescaped < and > through as that could result in security holes.

caitp · 2014-07-15T15:50:10Z

will do

caitp · 2014-07-15T20:03:51Z

@IgorMinar we have tests in the tree which verify that we correctly parse invalid HTML (

angular.js/test/ngSanitize/sanitizeSpec.js

Line 127 in 627b035

    
           expectHTML('a< SCRIPT >A< SCRIPT >evil< / scrIpt >B< / scrIpt >c.').toEqual('ac.');

) --- but what we're doing there isn't technically correct as far as the HTML spec is concerned (at least modern browsers won't treat this as a start tag at all, and will instead just encode the < before script)

Since I'm fixing up this CL, should I change that too? Or would that be too much of a breaking change. (testing on jsfiddle, even ie8 behaves correctly here)

… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes angular#8193

… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes #8212 Closes #8193

sylvain-hamel · 2014-07-16T21:42:50Z

@caitp Great news! thanks. Would you please backport the fix to v1.2 as this bug greatly affects us.Thanks.

caitp · 2014-07-16T21:44:13Z

@sylvain-hamel good news, it was backported 36d2658

sylvain-hamel · 2014-07-16T21:49:28Z

@caitp Great. I'll test the fix tomorrow.

caitp · 2014-07-16T22:12:46Z

@sylvain-hamel actually I broke our ci-checks task porting that into v1.2.x, I think I need to revert that and sort that out, but if you try with that sha it should work anyways

… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes angular#8212 Closes angular#8193

sylvain-hamel · 2014-07-23T19:17:44Z

Hi @caitp, I tired your fix (from 36d2658) and it does not work as I expected.

The following markup still causes the same failure ([$sanitize:badparse] The sanitizer was unable to parse the following block of html: <a href).

<div ng-init="value='<a href'"></div>
<div ng-bind-html="value"></div>

My expectation is that ng-bind-html should treat unterminated tags as regular content. In this case I expected <a href to be converted to <a href as discussed previously in this thread:

Me:

Should sanitize convert the < to < in text that looks like unterminated tags?

You:

Yes (for example, http://jsfiddle.net/fMDyg/)

Note that the fix I had submitted did work, however it left the < unencoded which is not clean.

caitp · 2014-07-23T19:21:40Z

@sylvain-hamel this is how the HTML parsing rules work: if you have a <, (optionally) followed by a /, followed by a character from the set [a-zA-Z], it is an opening tag, and the next step is to collect attributes.

$sanitize was changed (in that patch) to match this behaviour --- obviously it's a bit more complicated in a real HTML parser, but it's really the best we can do without a huge file size. So basically, that is working exactly how it should, and you should encode the < manually if you need a letter or slash to follow it.

sylvain-hamel · 2014-07-23T19:44:32Z

@caitp

this is how the HTML parsing rules work [...] next step is to collect attributes.

that's fine if the thing being parsed is a element. But here it is not, IMO sanitize should detect that and not treat it as HTML and instead just make it safe by encoding it.

What would be a good reason not to add this feature to sanitize?

caitp · 2014-07-23T19:47:05Z

Because we have no concept of insertion modes in the $sanitize parser, it would be too complex to support that.

Basically just escape your < the way you would in regular HTML if you need a < to precede a / or [a-zA-Z] --- if there's a space after, or any other character, it doesn't need to be encoded.

sylvain-hamel · 2014-07-23T20:02:44Z

In my case this is user provided content. If the user entered some valid html, I want the rendered result. But if he just entered some text that looks like html then I need it to be escaped and rendered as-is.

Basically just escape your < the way you would in regular HTML

In order for me to only encode the text that looks like html, I basically need to implement the feature I'm asking sanitize to support.

Do you agree that no input should ever cause sanitize to crash like this? It should be able to transform anything into something safe.

caitp · 2014-07-23T20:05:32Z

what exactly do you get out of binding the html <a ... anyway? what are you expecting to get out of that? Maybe it would be worth your while just creating a text node that looks like that manually, it would be trivial to write a directive to do that.

sylvain-hamel · 2014-07-23T20:23:39Z

given a view like this:

<textarea ng-model="value"/>
<hr />
<div ng-bind-html="value"/>

if the user enters this in the textarea:

b is <b>smaller than</b>  a. I can even say: b<a and that's shorter.

I want the output to be this:

b is smaller than a. I can even say: b<a and that's shorter.

But with ng-bind-html this input causes sanitize to crash.

sylvain-hamel · 2014-07-28T12:24:57Z

@caitp Does my use case make sense to you?

caitp · 2014-07-28T12:43:11Z

No, I think this is something that we shouldn't support in ngSanitize. There is no shortage of ways around it, though. As far as I'm concerned, ngSanitize should honour < a, but not <a beause <a is a tag, and if we can't find the end of the tag, then we have a problem

caitp · 2014-07-28T12:49:55Z

You could write a version of ngBindHtml which doesn't throw, or just creates a text node with the contents if it does throw, but I don't think ngBindHtml should support non-html

sylvain-hamel added cla: yes and removed cla: no labels Jul 14, 2014

fix: text that looks like an html tag but is not causes [$sanitize:ba…

26ff3f6

…dparse] error

Narretz added component: ngSanitize labels Jul 15, 2014

Narretz added this to the 1.3.0-beta.16 milestone Jul 15, 2014

IgorMinar reviewed Jul 15, 2014
View reviewed changes

IgorMinar assigned caitp Jul 15, 2014

caitp mentioned this pull request Jul 16, 2014

fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content #8212

Closed

caitp closed this in f6681d4 Jul 16, 2014

theurere mentioned this pull request Aug 26, 2014

Update angular to 1.2.23 strukturag/spreed-webrtc#98

Merged

fix: text that looks like an html tag but is not causes [$sanitize:badparse] error #8193

fix: text that looks like an html tag but is not causes [$sanitize:badparse] error #8193

Uh oh!

Conversation

sylvain-hamel commented Jul 14, 2014

Uh oh!

mary-poppins commented Jul 14, 2014

Uh oh!

caitp commented Jul 14, 2014

Uh oh!

sylvain-hamel commented Jul 14, 2014

Uh oh!

caitp commented Jul 14, 2014

Uh oh!

sylvain-hamel commented Jul 14, 2014

Uh oh!

IgorMinar Jul 15, 2014

Choose a reason for hiding this comment

Uh oh!

caitp Jul 15, 2014

Choose a reason for hiding this comment

Uh oh!

sylvain-hamel Jul 15, 2014

Choose a reason for hiding this comment

Uh oh!

caitp Jul 15, 2014

Choose a reason for hiding this comment

Uh oh!

sylvain-hamel Jul 15, 2014

Choose a reason for hiding this comment

Uh oh!

caitp Jul 15, 2014

Choose a reason for hiding this comment

Uh oh!

IgorMinar commented Jul 15, 2014

Uh oh!

caitp commented Jul 15, 2014

Uh oh!

caitp commented Jul 15, 2014

Uh oh!

sylvain-hamel commented Jul 16, 2014

Uh oh!

caitp commented Jul 16, 2014

Uh oh!

sylvain-hamel commented Jul 16, 2014

Uh oh!

caitp commented Jul 16, 2014

Uh oh!

sylvain-hamel commented Jul 23, 2014

Uh oh!

caitp commented Jul 23, 2014

Uh oh!

sylvain-hamel commented Jul 23, 2014

Uh oh!

caitp commented Jul 23, 2014

Uh oh!

sylvain-hamel commented Jul 23, 2014

Uh oh!

caitp commented Jul 23, 2014

Uh oh!

sylvain-hamel commented Jul 23, 2014

Uh oh!

sylvain-hamel commented Jul 28, 2014

Uh oh!

caitp commented Jul 28, 2014

Uh oh!

caitp commented Jul 28, 2014

Uh oh!

Uh oh!