Skip to content
This repository has been archived by the owner on Oct 23, 2019. It is now read-only.

Implementation of 'htmlspecialchars' is not complete #69

Open
linsmod opened this issue Oct 14, 2016 · 8 comments
Open

Implementation of 'htmlspecialchars' is not complete #69

linsmod opened this issue Oct 14, 2016 · 8 comments

Comments

@linsmod
Copy link

linsmod commented Oct 14, 2016

the htmlspecialchars will not translate any word in this list
"&", """, "'", "<", ">"

but HtmlSpecialCharsEncode does not implements the logic, so if the input contains any words in the list above, the output will be unexpected.
eg. & will be translated to &

found the issue when woking with wordpress 4.6.1

@linsmod
Copy link
Author

linsmod commented Oct 14, 2016

And the function should not translate symbol & to & A M P when its not an attribute.
this is my personal fix
`
internal static string HtmlSpecialCharsEncode(string str, int index, int length, QuoteStyle quoteStyle, string charSet)
{

        if (str == null) return String.Empty;

        Debug.Assert(index + length <= str.Length);

        StringBuilder result = new StringBuilder(length);

        // quote style is anded to emulate PHP behavior (any value is allowed):
        string single_quote = (quoteStyle & QuoteStyle.SingleQuotes) != 0 ? "&#039;" : "'";
        string double_quote = (quoteStyle & QuoteStyle.DoubleQuotes) != 0 ? "&quot;" : "\"";
        var strArray = new string(str.Skip(index).Take(length).ToArray()).Split('&');
        var strList = new List<string>();
        foreach (var item in strArray)
        {
            for (int i = 0; i < item.Length; i++)
            {
                char c = item[i];
                switch (c)
                {
                    case '&':
                        result.Append("&amp;"); break;
                    case '"':
                        result.Append(double_quote); break;
                    case '\'':
                        result.Append(single_quote); break;
                    case '<':
                        result.Append("&lt;"); break;
                    case '>':
                        result.Append("&gt;"); break;
                    default:
                        result.Append(c); break;
                }
            }
            strList.Add(result.ToString());
            result.Clear();
        }
        return string.Join("&", strList);
    }`

@lucyllewy
Copy link

Does the behaviour of Phalanger differ from PHP?

It appears you are stating that Phalanger escapes &amp; to &amp;amp; when run through htmlspecialchars(). If that is your intended message, then this behaviour is consistent with the PHP implementation and thus is not a bug and will not be changed.

@jakubmisek
Copy link
Member

jakubmisek commented Oct 16, 2016

Right, if you have a small test case in PHP, please try it with Phalanger and legacy PHP first whether it differs.

@linsmod linsmod closed this as completed Oct 17, 2016
@linsmod
Copy link
Author

linsmod commented Oct 17, 2016

code:
echo 'quote_style:'.$quote_style; echo 'charset:'.$charset; echo 'double_encode:'.$double_encode; echo $string; echo htmlspecialchars($string); $string = @htmlspecialchars( $string, $quote_style, $charset, $double_encode ); die($string);

offical php output:
quote_style:3charset:UTF-8double_encode:http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1

Phalanger output:
quote_style:3charset:UTF-8double_encode:http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1

@linsmod linsmod reopened this Oct 17, 2016
@linsmod
Copy link
Author

linsmod commented Oct 17, 2016

when invoke htmlspecialchars( $string ) twice, the official php and Phalanger get the same result
however if the second invokation with parameters
htmlspecialchars( $string ) htmlspecialchars( $string, $quote_style, $charset, $double_encode );

the test results are different.

the official php seems like fixed &amp;amp; but Phalanger not.

@lucyllewy
Copy link

Please can you tidy your test case to separate the outputs so that we can see what is output by which part of the test. It is still not clear what you are actually stating is the problem, i.e. which circumstance causes the issue you perceive. As an example, it is not clear what value $double_encode has in your example: is it true, false, empty string, null, ....?

@linsmod
Copy link
Author

linsmod commented Oct 18, 2016

known little about php, how can i know the $double_encode is ture/false, or empty string or null or someting else?

@hlizard
Copy link

hlizard commented Oct 23, 2016

I also meet this problem when try wp4.6.1 with Phalanger.
hlizard/WpDotNet@72a0595

居然是同胞,我也是不懂php,太巧了

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants