Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character encoding bugs in RSS feed #2313

Closed
JohnONolan opened this issue Mar 2, 2014 · 12 comments · Fixed by #2318
Closed

Character encoding bugs in RSS feed #2313

JohnONolan opened this issue Mar 2, 2014 · 12 comments · Fixed by #2318
Labels
bug [triage] something behaving unexpectedly
Milestone

Comments

@JohnONolan
Copy link
Member

You've got some character encoding bugs in your feed's xml. Apostrophes are being incorrectly encoded:

<title></title>

feed

@JohnONolan JohnONolan added this to the 0.5 milestone Mar 2, 2014
@JohnONolan JohnONolan added the bug label Mar 2, 2014
@knunery
Copy link
Contributor

knunery commented Mar 2, 2014

I'll take this one.

@knunery
Copy link
Contributor

knunery commented Mar 2, 2014

In core/server/controllers/frontend.js I see the following code title: _.escape(post.title),. So the encoding bug shows up because of this code.

When I look up the documentation for the escape function for underscore I see that it encodes &, <, >, ", and '. From this I can conclude that double quotes and apostrophes are being encoded when they shouldn't be.

Let me do some digging to see if there is a builtin utility function to just escape the &, <, and > characters.

@javorszky
Copy link
Contributor

We've replaced underscore with lodash, although the .escape behaves exactly the same as far as I can tell from the docs. http://lodash.com/docs#escape

That's just a wrapper for .replace (https://github.com/lodash/lodash/blob/2.4.1/dist/lodash.compat.js#L6194), which is using this array: https://github.com/lodash/lodash/blob/2.4.1/dist/lodash.compat.js#L2004.

@knunery
Copy link
Contributor

knunery commented Mar 2, 2014

I should've noticed it was lodash 😄 and not underscore.

So it's easy enough to write a custom function to do this (based on lo-dash's approach which looks almost identical to underscores approach).

function escapeXmlElement(string) {
    var reUnescapedXml = /[&<>]/g;

    var xmlElementEscapes = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;'
      };

    function escapeXmlChar(match) {
        return xmlElementEscapes[match];
    }

    return string == null ? '' : String(string).replace(reUnescapedXml, escapeXmlChar);
}

Trying to figure out a good place to put this function.

@ErisDS
Copy link
Member

ErisDS commented Mar 4, 2014

I believe that this is actually a double encoding issue.

It seems that _.escape was added to the RSS feed in #767 in order to fix #715, but node-rss relies on node-xml which should escape all data for us.

I think we need to remove the _.escape all together, and if #715 is still a problem, we should raise that with node-rss.

@knunery
Copy link
Contributor

knunery commented Mar 4, 2014

If we remove the _.escape all together then #715 will be an issue again. #715 was an issue because of a bug in node-rss. Node-rss shoves the title into a CDATA xml section without any encoding. The bug in #715 was due to the ]]> in the example title of <[[ my post title ]]>. A CDATA section cannot contain ]]>. In short Node-rss should do something with ]]> when it sees that for a title or description. I will check if this is a bug in the latest version of node-rss and if so raise it up as a bug as you have suggested 😄. A long winded way of saying, "Yes I agree".

Just as a note... it seems using CDATA for RSS feeds is not standard. Not necessarily bad, but not standard. Most RSS feeds seem to use no CDATA sections and standard xml encoding practice( encode &, <, and > for data in an xml element).

@ErisDS
Copy link
Member

ErisDS commented Mar 5, 2014

I think that #2263 is likely to turn up a lot of oddness with RSS - which we can report to node-rss, but perhaps looking for a different solution will be in order.

@halfdan
Copy link
Contributor

halfdan commented Mar 5, 2014

I'm very positive on replacing node-rss. One of the app ideas is to add additional fields to the RSS (e.g. for Podcasts), but node-rss only allows a predefined set of fields.

The project doesn't seem to be very active from what I see in https://github.com/dylang/node-rss/issues. There was someone two years ago trying to allow custom fields (dylang/node-rss#6), but that never landed.

@ErisDS ErisDS modified the milestones: 0.4.2, 0.5 Mar 10, 2014
@ErisDS
Copy link
Member

ErisDS commented Mar 10, 2014

@knunery Any chance we can get an updated PR to fix the problem as described? I realise this is symptomatic of a larger issue, but we can deal with that separately. Would like to get this in 😺

@knunery
Copy link
Contributor

knunery commented Mar 10, 2014

I made a pull request that only removes the lodash _.escape around the title. This will cause #715 to be an issue again but is an acceptable short term solution. Let me know if there is anything remaining on this issue.

@halfdan
Copy link
Contributor

halfdan commented Mar 10, 2014

I just read up on the issue: _.escape shouldn't be used at all. The title seems to be properly escaped but it should then not be wrapped in a CDATA. Was anyone able to figure out where the CDATA comes from?

@knunery: I'm sorry I'm that picky but: should've not should of

@knunery
Copy link
Contributor

knunery commented Mar 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [triage] something behaving unexpectedly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants