Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSS Refactor: Replace showdown-xss with in-house sanitizer #724

Merged
merged 2 commits into from
Mar 22, 2018

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Mar 10, 2018

See #681
See #694
See #721

The showdown-xss library wasn't really doing for us what we wanted.
It was transforming our checkbox inputs and making them display as plain
text instead of as actual HTML.

In this patch I've removed showdown-xss and created a new centralized
function to render HTML which calls our custom sanitization.

Why do it ourselves? We don't need the kind of sanitiation which only
removes the malicious code and leaves as much as it can untouched.
Instead we are free here to strip out basically everything except for
a few white-listed tags and attributes since we ourselves are the ones
producing the output; we don't have to support full HTML in the notes.

This patch should guard against everything on the OWASP list of XSS
attacks. It will remove significant styling and custom HTML but when a
tag is removed it will usually just take out the tag itself and leave
the inner content as plain text. Some tags are "forbidden" and all their
children are removed with them.

cc: @Jackymancs4

With the contributions and discussion about the checkboxes I really wanted
to get that bug fixed but I was also uneasy about the proposed solutions in
#721 and #714. I was uneasy because our XSS libraries were already opaque
and less-maintained than we would want. Digging into libraries also revealed
custom HTML parsing code that I wasn't sure I trusted.

In this proposal we're relying on the browser to parse the HTML and walk
through it node-by-node giving us the best possible parse with the protection
the browser already has coded into it against attacks that target the parser…

<
script
>…<
/script>

…since we're generating markup and we don't promise "everything Markdown
can do is available here" then we can produce whatever subset of Markdown
we want to support. I have chosen to try and mimic the allowed tags and attributes
as is currently supported in the web version of Simplenote. We can add more
tags such as <abbr> and <acronym> and I think if we wanted to we could
open up some styling or class attributes but for now I'm reaching for harmony
across the platforms more than one being better than others.

Testing

Try to break rendering with malicious attacks and also verify that checkboxes
and expected output appears.

Copy link
Contributor

@roundhill roundhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good @dmsnell! I tested with a bunch of malicious content and it was all stripped or removed. There is a conflict now to take care of 🙄

Found a few things that might be good to fix here or we can do so in followup PRs.

@@ -251,7 +242,9 @@ export const NoteEditor = React.createClass({
<div
style={printStyle}
className="note-print note-detail-markdown"
dangerouslySetInnerHTML={{ __html: noteContent }}
dangerouslySetInnerHTML={{
__html: markdownEnabled ? renderNoteToHtml(content) : content,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that my malicious note would be rendered quickly when printing (if markdown was disabled for the note), maybe we should always pass this through renderNoteToHtml for safety's sake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you know what…we shouldn't be setting __html to content if Markdown is disabled. it's not HTML so it shouldn't be treated that way - this is a separate issue so let's do it in another PR (because we'll have to make sure the styling all works out

{ markdownEnabled ? (
	<div
		style={printStyle}
		className="note-print note-detail-markdown"
		dangerouslySetInnerHTML={{ __html: renderNoteToHtml( content ) }}
	/>
) : (
	<div style={printStyle} className="note-print note-detail">{ content }</div>
) }

({ name, value }) =>
!isAllowedAttr(tagName, name) ||
// only valid http(s) URLs are allowed
(('href' === name || 'src' === name) && !validUrl.isWebUri(value))
Copy link
Contributor

@roundhill roundhill Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about mailto links? The markdown for that is typically:

[emailme](mailto:dennis@raddude.com)

const tagName = node.nodeName.toLowerCase();

if ('input' === tagName) {
return 'checkbox' === node.getAttribute('type');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that checkboxes render with bullets, I think we should remove those so they look more like GitHub:

screen shot 2018-03-15 at 3 19 05 pm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more keen to fix this in a separate PR since this one right now is solely there for security. The rendering of bullets is going to be more related to how showdown generates the Markdown than how we sanitize it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 👍

@dmsnell dmsnell force-pushed the refactor/xss-protection branch from 9e4eef8 to 4a7eee2 Compare March 21, 2018 06:30
See #681
See #694
See #721

The `showdown-xss` library wasn't really doing for us what we wanted.
It was transforming our checkbox inputs and making them display as plain
text instead of as actual HTML.

In this patch I've removed `showdown-xss` and created a new centralized
function to render HTML which calls our custom sanitization.

Why do it ourselves? We don't need the kind of sanitiation which only
removes the malicious code and leaves as much as it can untouched.
Instead we are free here to strip out basically everything except for
a few white-listed tags and attributes since we ourselves are the ones
producing the output; we don't have to support full HTML in the notes.

This patch should guard against everything on the OWASP list of XSS
attacks. It will remove significant styling and custom HTML but when a
tag is removed it will usually just take out the tag itself and leave
the inner content as plain text. Some tags are "forbidden" and all their
children are removed with them.
@dmsnell dmsnell force-pushed the refactor/xss-protection branch from 4a7eee2 to 3a7574d Compare March 21, 2018 06:41
@dmsnell
Copy link
Member Author

dmsnell commented Mar 21, 2018

Updated @roundhill to allow email links but I left the other comments for separate issues. Thoughts?

@roundhill
Copy link
Contributor

Nice, I think we're good to go here. :shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants