Skip to content
This repository has been archived by the owner on Jan 29, 2020. It is now read-only.

Attribute escaping #21

Open
autowp opened this issue Jan 23, 2017 · 5 comments
Open

Attribute escaping #21

autowp opened this issue Jan 23, 2017 · 5 comments

Comments

@autowp
Copy link

autowp commented Jan 23, 2017

Which requires escaping a large number of characters in attributes? [^a-z0-9,\.\-_]
URL's in html looks ugly and are larger than possible

<a href="https&#x3A;&#x2F;&#x2F;www.example.com&#x2F;">
<a href="https://www.example.com/">
@Ocramius
Copy link
Member

"Ugly" is not the problem when security-sensitive contexts. Also, most source viewers will already make these attributes simple to read (Firefox does, for example).

As for the size, gzip compression generally deals with it.

@autowp
Copy link
Author

autowp commented Jan 23, 2017

That not easy to understand where is security improvements here.

For example, why "dot" is secure character but "semicolon" is not?

As for the size: On my example cyrillic page where escapeHtmlAttr partially used:
68988 bytes - escaped only quotes and angle brackets
83611 bytes - escaped by escapeHtmlAttr (+20%)

Same with gzip
11116 bytes
11790 bytes (+6%)

Indeed, the size is not crucial.

@craigfrancis
Copy link

Are you asking to add more characters to the whitelist, so they don't get encoded?

Maybe you could argue that certain characters like ":" don't need to be escaped, but it's easier to have a very small white-list of "known good" characters ([^a-z0-9,\.\-_]), than trying to work out which characters are allowed in each context.


For anyone not familiar with the background... the reason escapeHtmlAttr() encodes more aggressively than escapeHtml() is for non-quoted attributes.

Lets say someone did:

$url = 'https://www.example.com/';
<a href=<?= $escaper->escapeHtmlAttr($url) ?>>

Notice that it does not include quote marks.

This creates the fairly "ugly" output:

<a href=https&#x3A;&#x2F;&#x2F;www.example.com&#x2F;>

What happens if $url was provided by the user (maybe a link to their website), and they set it to:

$url = 'https://www.example.com/ onclick=do_evil_thing';

Without using escapeHtmlAttr(), it would create the perfectly valid:

<a href=https://www.example.com/ onclick=do_evil_thing>

This means they can create an onclick event handler on your website :-)


You could still use escapeHtml() or htmlspecialchars(), but you must make sure your attributes are quoted.

<a href="<?= $escaper->escapeHtml($url) ?>">

So that it creates:

<a href="https://www.example.com/">

Or, if you want to use htmlspecialchars(), don't forget to use it in full:

htmlspecialchars($url, ENT_QUOTES | ENT_SUBSTITUTE, 'utf-8')

PS: Have a look at adding a CSP (Content Security Policy), and set it so that it does not allow unsafe-inline for scripts or styles. This will probably require you to make some changes, but it adds a second line of defence against this problem, where any attributes like onclick would be blocked by the browser.

@froschdesign
Copy link
Member

@craigfrancis
Thanks for your explanation! I think, this could improve the documentation.

@weierophinney
Copy link
Member

This repository has been closed and moved to laminas/laminas-escaper; a new issue has been opened at laminas/laminas-escaper#3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants