Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebUtility.HtmlDecode does not decode all HTML5 character entities #19103

Closed
devatwork opened this issue Oct 26, 2016 · 5 comments
Closed

WebUtility.HtmlDecode does not decode all HTML5 character entities #19103

devatwork opened this issue Oct 26, 2016 · 5 comments
Labels
area-System.Net design-discussion Ongoing discussion about design without consensus enhancement Product code improvement that does NOT require public API changes/additions help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@devatwork
Copy link

The current character entity replace dictionary used inside WebUtility uses the entity set as defined in HTML4, see https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Extensions/src/System/Net/WebUtility.cs#L757.

However, the HTML5 spec defines additional entities, see https://www.w3.org/TR/html5/syntax.html#named-character-references.

I'm willing to send a PR to include the new named character references, if this that would be an acceptable change to the behavior of WebUtility.HtmlDecode. Any specific guidance with respect to making this change? Are there specific areas that need to be addressed?

@karelz
Copy link
Member

karelz commented Oct 26, 2016

Sounds like reasonable thing to do. @davidsh @CIPop any suggestions?

@karelz
Copy link
Member

karelz commented Oct 26, 2016

Feel free to submit a PR.
Guidance:

  • We should avoid breaking compat -- i.e. do not change existing 'wrong' code (if you find any). Such changes should be discussed explicitly upfront.
  • Add test coverage

devatwork referenced this issue in devatwork/corefx Oct 29, 2016
Note: I had to change the lookup from a Dictionary<string,char> to a Dictionary<string,string> in order to account for HTML entities that represent multiple Unicode characters, like &acE; for example.

Closes #13036
@karelz
Copy link
Member

karelz commented Nov 21, 2016

Continued design discussion from the PR dotnet/corefx#13152 (comment):
@DamianEdwards do you have suggestion how to go from here?

The safe route is to introduce just yet-another class for HTML5 only.

@karelz
Copy link
Member

karelz commented Nov 23, 2016

@DamianEdwards ping?

@karelz
Copy link
Member

karelz commented Oct 2, 2019

Triage: Very little interest in last 3 years (just 2 upvotes). Let's close it. If it has higher demand, we can reconsider.

@karelz karelz closed this as completed Oct 2, 2019
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net design-discussion Ongoing discussion about design without consensus enhancement Product code improvement that does NOT require public API changes/additions help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

3 participants