Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

&quot getting rendered as ", without a semicolon #362

Closed
tyler-shuhnicki opened this issue Jun 24, 2022 · 3 comments
Closed

&quot getting rendered as ", without a semicolon #362

tyler-shuhnicki opened this issue Jun 24, 2022 · 3 comments

Comments

@tyler-shuhnicki
Copy link

tyler-shuhnicki commented Jun 24, 2022

Hello!

We recently began sanitize sitewide of all requests and had a very strange production issue. We sanitize all incoming query string parameters, and only one our APIs began suddenly failing.

After many red herrings, the issue was discovered: our query string had multiple parameters, and the parameter quoteId was behind an &. See screenshot below for the translation found in our logs:

image

In the short term, we've capitalize the Q which solves the problem, but I believe &quot should not be rendered as " if there is no semicolon since the proper HTML entity is &quot ; (needed to add a space - Github renders it as " otherwise!)

Thanks - this project has been very helpful!

@mganss
Copy link
Owner

mganss commented Jun 25, 2022

I believe this is expected behavior. While it's true that the missing semicolon causes a parse error, according to the spec the parser behaves as if the semicolon was present:

This error occurs if the parser encounters a character reference that is not terminated by a U+003B (;) code point. Usually the parser behaves as if character reference is terminated by the U+003B (;) code point

The browsers I've tested behave the same way. Given the following HTML browsers show A " B:

<!DOCTYPE html>
<html>
<body>
    <p>A &quot B</p>
</body>
</html>

What's your use case? Is the query string you're sanitizing part of an HTTP request you're handling?

@mganss mganss closed this as completed Jun 25, 2022
@tyler-shuhnicki
Copy link
Author

Interesting - that's unexpected on our end, but if browsers follow the same spec, there's little to argue.

Yes - we've added sanatization in our middleware that strips out any XSS in the request path, body, and query string to further hardern our API. This was obviously a very unexpected edge case - and solved fairly easily by sending "quoteId" with a capital Q instead of its lowercase form.

@mganss
Copy link
Owner

mganss commented Jun 28, 2022

IMHO this is not the ideal level to apply sanitization at. Perhaps it's better to sanitize only those strings that are intended to be rendered back to HTML. So let your server side code decode the query string and apply validation, then sanitize only the relevant parameters that are supposed to be rendered back out later (like user comments etc). This would prevent the issue you have encountered. For example:

?quoteId=1234&comment=%3Cb%3Etest%3C%2Fb%3E%3Cscript%3Exyz%3C%2Fscript%3E

First, decode the query string into (pseudo code)

var quoteId = 1234;
var comment = "<b>test</b><script>xyz</script>";

Then sanitize only comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants