Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carefully constructed markup sneaks tags through as "text" #105

Closed
boutell opened this issue Oct 14, 2014 · 6 comments
Closed

Carefully constructed markup sneaks tags through as "text" #105

boutell opened this issue Oct 14, 2014 · 6 comments

Comments

@boutell
Copy link

boutell commented Oct 14, 2014

This code:

<<img src="javascript:evil"/>img src="javascript:evil"/>

Results in the following sequence of onopentag/ontext/onclosetag events:

text: < open: img (with the expected src attribute) close: img text: img src="javascript:evil"/>

Since the sanitize-html module trusts "text" coming from htmlparser2, and outputs it without further escaping (because htmlparser2 does not decode entities in text before delivering it), this results in an XSS attack vector if sanitize-html ignores the img tag (according to user-configured filter rules) but passes the text intact, as it must do to keep any text in documents.

I have verified that the bug still exists as of version 3.7.3.

@bkimminich
Copy link

👍

@fb55
Copy link
Owner

fb55 commented Oct 20, 2014

Sorry for the late response.

@boutell You can enable entity decoding using decodeEntities: true. Anyway, that looks like a bug, maybe switching to the tokenizer of high5 fixes it. Won't be fixed anytime soon though.

@boutell
Copy link
Author

boutell commented Oct 20, 2014

Thanks for the update. A pity it won't be fixed soon, but hey, we're not
paying you to fix it.

On Mon, Oct 20, 2014 at 2:44 PM, Felix Böhm notifications@github.com
wrote:

Sorry for the late response.

@boutell https://github.com/boutell You can enable entity decoding
using decodeEntities: true. Anyway, that looks like a bug, maybe
switching to the tokenizer of high5 fixes it. Won't be fixed anytime soon
though.


Reply to this email directly or view it on GitHub
#105 (comment).

*THOMAS BOUTELL, *DEV & OPS
P'UNK AVENUE | (215) 755-1330 | punkave.com

@fb55
Copy link
Owner

fb55 commented Jan 11, 2015

As this seems to be confusing for a lot of people: This is not a vulnerability, but instead a bug in @boutell's module. The behavior is in-line with the HTML spec (I wasn't sure about it in my previous comment).

Entities aren't decoded by default, only not to break backwards compatibility, but will be in the next major release (which will mainly consist of #114, I only need to take a day and add positional support to high5). It is recommended to always decoded entities, then use eg. entities to encode them again. Of course this has a performance penalty, but it eliminates this risk.

For now, I'll add a note to the wiki page recommending to always enable decodeEntities, which is pretty much everything that can be done here.

@fb55 fb55 closed this as completed Jan 11, 2015
@boutell
Copy link
Author

boutell commented Jan 11, 2015

OK, version 1.5.1 of sanitize-html uses decodeEntities: true and passes its filter evasion tests without the need for recursive invocation. Thanks.

If the behavior with decodeEntities: false is inherently unsafe I wonder if it should be offered at all in the next release. But maybe it has an application I'm not seeing.

@AlynxZhou
Copy link

Always depending on encode() is not a good opinion, because it also encode CJK chars into entities, which is hard to do string operations (chars and length are changed)...

AdamBarah pushed a commit to AdamBarah/retire.js that referenced this issue Jul 15, 2022
AdamBarah pushed a commit to AdamBarah/retire.js that referenced this issue Jul 15, 2022
false positive (albeit it could use more secure default)

see fb55/htmlparser2#105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants