Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change escape function to fix bug with default parse5 parser #85

Closed
wants to merge 2 commits into from
Closed

Change escape function to fix bug with default parse5 parser #85

wants to merge 2 commits into from

Conversation

AlynxZhou
Copy link

Cheerio 1.0.0-RC3 uses parse5 instead of htmlparser2 for HTML parsing, but parse5 decode all entities (include <, >) no matter decodeEntities is true or false, so we cannot use this to decide whether a re-encode is needed. Plus users who using non-ASCII chars may not want their chars becomes XML entities.

After this PR, if we are using htmlparser2, it will work as old times. And when we are using parse5 (by default), it will only escape HTML chars like <>&"' and won't encode user's chars.

Also normalizeWhitespace is not supported by parse5 either so I removed related test cases.

Solves cheeriojs/cheerio#1198. I suggest you to read my comments in this issue to get a clear view of what the problem is and how it happens.

@curbengh
Copy link

Can you help me clarify something,
in your previous PR #80, I need to enable decodeEntities to have valid render like

<!-- some mandarin comment 取值范围 -->
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="horizontal"
    >

If I disable decodeEntities, it will be rendered to

&amp;amp;amp;lt;!-- some mandarin comment 取值范围 --&amp;amp;amp;gt;
&amp;amp;amp;lt;LinearLayout xmlns:android=&amp;amp;quot;http://schemas.android.com/apk/res/android&amp;amp;quot;
    android:layout_width=&amp;amp;quot;match_parent&amp;amp;quot;
    android:layout_height=&amp;amp;quot;match_parent&amp;amp;quot;
    android:orientation=&amp;amp;quot;horizontal&amp;amp;quot;
    &amp;amp;amp;gt;

In this PR, since decodeEntities is not supported in the default parse5, enable/disable the option has no effect, both did result in valid rendered file.

So, my understanding is that decodeEntities is only relevant when either xmlMode or _useHtmlParser2 is true, is that correct?

@AlynxZhou
Copy link
Author

AlynxZhou commented Aug 15, 2019 via email

curbengh added a commit to curbengh/hexo-nofollow that referenced this pull request Aug 15, 2019
it is not supported by the default parse5
cheeriojs/dom-serializer#85
@AlynxZhou
Copy link
Author

To people who need this PR, I find that https://github.com/cheeriojs/cheerio/blob/c635bea08f72f6670977674d40d44a5edd7f4a31/lib/static.js#L106 in cheerio v1.0.0 fixed this problem in a better way by using parse5.serialize() instead of dom-serializer, so this PR is useless now, we can wait for cheerio v1.0.0 to release.

I will close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants