Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML code changed to character when sanitizing #190

Closed
benborra opened this issue Oct 1, 2019 · 3 comments
Closed

HTML code changed to character when sanitizing #190

benborra opened this issue Oct 1, 2019 · 3 comments

Comments

@benborra
Copy link

benborra commented Oct 1, 2019

When sanitizing an HTML string that contains an "&" this is, as expected unchanged.
However when the same thing is done for "ã" the returned value is ã.

This is, from what I can tell incorrect behavior.

I haven't extensively tested this but it also seems to occur for "°" elements and probably for many more items.

image

@mganss
Copy link
Owner

mganss commented Oct 1, 2019

This is expected behavior. HtmlSanitizer uses AngleSharp, a standards compliant HTML parser that parses the input before it is sanitized. This results in some entities getting expanded. It works the same way in a browser:

var e = document.createElement("div");
e.innerHTML = "&atilde &";
e.innerHTML // -> "ã &"

@benborra
Copy link
Author

benborra commented Oct 3, 2019

Hmm ok, the issue originally arose from a signature creator tool for outlook. This seems to have problesm resolving the 'ã' character (along with a couple of other characters. Apparently this might be more of an issue with outlook rather then the sanitized Html.
I've for now added an IMarkupFormatter to process these characters that have caused issues.
Might be nice to have a way of having the sanitizer resolve this. I however understand that this isn't really your issue to fix.

@mganss
Copy link
Owner

mganss commented Oct 3, 2019

Perhaps this is an encoding issue, either within Outlook or the code that is generating the email. I'd rather not add code to HtmlSanitizer to work around these kinds of issues in third-party software. Maybe you could add a short snippet of your IMarkupFormatter implementation here so other people who find this issue can also use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants