Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: Permit simple HTML in chat messages again #1571

Closed
wants to merge 1 commit into from

Conversation

hoffie
Copy link
Member

@hoffie hoffie commented Apr 27, 2021

PR #939 disabled HTML usage in chat messages by escaping all user input at the server side for security reasons.

There is demand to keep basic text formatting working for sharing lyrics/chords in a sane way.

This PR re-enables certain safe tags again to enable such use cases.

This works by keeping the existing message escaping, but converting selected safe HTML tags back into their parsable form.

To avoid worsening security again, this PR deliberately

  • does not permit CSS or any color changes which could enable user
    impersonation,
  • does not accept any attributes in HTML tags,
  • does not accept invalid HTML (e.g. start tags without end tags),
  • does not accept nested HTML tags (except for <br>s within <pre>...</pre>).
  • does not handle uppercase tags or XML-style <br/>

References: #1021 #1524

This PR seems to work, but I'm marking it as draft because the topic hasn't been discussed fully yet.

Freel free to post feedback on the code here. Feedback regarding the best approach should rather be posted to the discussion thread.

cc @atsampson

PR jamulussoftware#939 disabled HTML usage in chat messages by escaping all user input
at the server side for security reasons.

There is demand to keep basic text formatting working for sharing
lyrics/chords in a sane way.

This PR re-enables certain safe tags again to enable such use cases.

This works by keeping the existing message escaping, but converting
selected safe HTML tags back into their parsable form.

To avoid worsening security again, this PR deliberately
- does not permit CSS or any color changes which could enable user
  impersonation,
- does not accept any attributes in HTML tags,
- does not accept invalid HTML (e.g. start tags without end tags),
- does not accept nested HTML tags (except for <br>s within
  <pre>...</pre>).

References:
jamulussoftware#1021
jamulussoftware#1524
@hoffie hoffie marked this pull request as draft April 27, 2021 21:45
@pljones
Copy link
Collaborator

pljones commented Apr 28, 2021

does not handle uppercase tags or XML-style <br/>

So you want <br></br>? No! ;) Empty tags with <... /> are the way forward... I've trained myself to use them now, you can't go taking them away... :D

Any reason for the lower case requirement?

@hoffie
Copy link
Member Author

hoffie commented Apr 28, 2021

does not handle uppercase tags or XML-style <br/>

So you want <br></br>?

No, <br> (no ending </br>).

Empty tags with <... /> are the way forward... I've trained myself to use them now, you can't go taking them away... :D

I feel the same, but with XHTML having failed and HTML5 being the accepted standard, I think <br> is the most-accepted form?

Any reason for the lower case requirement?

Both the <br/> limitation and the lower case limitation are just to keep the implementation simple. It's not a requirement. It can easily be extended to suppor that, but it'll need some more lines of code. I can do that if we want to move forward with this.

@pljones
Copy link
Collaborator

pljones commented Apr 28, 2021

I'm not really fussed... I'd actually rather avoid the feature entirely. Text chat should be plain text. 🤷

@reneknuvers
Copy link

I'm not really fussed... I'd actually rather avoid the feature entirely. Text chat should be plain text. 🤷

I think having chords in bold or italics is a good thing. Markdown support is over the top, but would be even safer I guess.

HTML export is widely available in different apps. I think it would be good to look at some example output from for example Onsong and other commonly used apps to see how tags are written and which tags are essential to readability of the lyrics and lead sheets.

@atsampson
Copy link
Contributor

I was thinking about how to do this the other day, and your approach looks pretty reasonable to me. I'd agree that the set of HTML elements it allows through needs to be really minimal, and forcing correct nesting and blocking attributes entirely are both good ideas - the Qt HTML parser really isn't designed for dealing with untrusted HTML.

I'd suggest adding some more explicit comments in your "unsanitisation" code to make it clear exactly what it's doing and why (basically the same stuff you've said in your pull request message) - to avoid someone coming along in the future and adding back features that can't be handled safely. The use of [^<>] in the middle of the regexps in particular is pretty subtle and would benefit from a comment that it's explicitly there to avoid matching something that's already been expanded.

This is very much the kind of code that'd benefit from some unit tests if anyone ever gets around to adding a test framework to Jamulus...

(One minor concern I'd have is that minimal regexp matching is expensive, so it might be possible for someone to submit a chat message that uses a lot more CPU time than you'd expect to sanitise. Hopefully the message length limit will put a brake on this, though. You could do it without the regexp searches by matching tags exactly and keeping track of the open elements on a stack, if needed.)

@pljones
Copy link
Collaborator

pljones commented Apr 30, 2021

This is very much the kind of code that'd benefit from some unit tests if anyone ever gets around to adding a test framework to Jamulus...

Something I seem to be thinking nearly every day...

foreach ( auto tag, qlPermittedChatTagNames )
{
QRegExp pattern = QRegExp ( "&lt;(" + tag + ")&gt;([^<>]+)&lt;/" + tag + "&gt;" );
pattern.setMinimal ( true ); // non-greedy matching
Copy link
Member

@softins softins May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest it might be a good idea also to add the following:

pattern.setCaseSensitivity(Qt::CaseInsensitive);  // allow upper or lower case tags

Just in case the user is pasting in HTML that happens to use upper-case tags.

@hoffie
Copy link
Member Author

hoffie commented May 10, 2021

Thanks for all the technical feedback. It would of course be possible to incorporate it. However, it seems that the demand for actual multi-line, monospace input is higher (#1021) than demand for individual formatting.
In the interest of keeping things simple, I'm therefore closing this PR for now. However, if it turns out that the other approaches are not feasible, we could still follow up with this PR.

@hoffie hoffie closed this May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants