Skip to content
This repository has been archived by the owner on Apr 12, 2024. It is now read-only.

fix(ngSanitize): encode surrogate pair properly #6911

Closed
wants to merge 1 commit into from
Closed

fix(ngSanitize): encode surrogate pair properly #6911

wants to merge 1 commit into from

Conversation

memolog
Copy link
Contributor

@memolog memolog commented Mar 29, 2014

The encodeEndities function encode non-alphanumeric characters to entities with charCodeAt.
charCodeAt does not return one value when their unicode codeponts is higher than 65,356.
It returns surrogate pair, and this is why the Emoji which has higher codepoints is garbled.
We need to handle them properly.

Closes #5088

The encodeEndities function encode non-alphanumeric characters to entities with charCodeAt.
charCodeAt does not return one value when their unicode codeponts is higher than 65,356.
It returns surrogate pair, and this is why the Emoji which has higher codepoints is garbled.
We need to handle them properly.

Closes #5088
@mary-poppins
Copy link

Thanks for the PR! Please check the items below to help us merge this faster. See the contributing docs for more information.

  • Uses the issue template (#6911)

If you need to make changes to your pull request, you can update the commit with git commit --amend.
Then, update the pull request with git push -f.

Thanks again for your help!

@caitp
Copy link
Contributor

caitp commented Mar 29, 2014

I like this.

But one thing (since I've only ever dealt with these encoding issues in low level languages), do we have to worry about the byte order of each surrogate, and potentially handle cases where it's swapped, or is the browser going to deal with that for us so that this is handled consistently.

I assume that this is good enough, but admittedly I'm not actually positive about this!

In any case, I think we can probably ship this on monday or tuesday.

@mary-poppins
Copy link

I'm sorry, but I wasn't able to verify your Contributor License Agreement (CLA) signature. CLA signature is required for any code contributions to AngularJS.

Please sign our CLA and ensure that the CLA signature email address and the email address in this PR's commits match.

If you signed the CLA as a corporation, please let us know the company's name.

Thanks a bunch!

PS: If you signed the CLA in the past then most likely the email addresses don't match. Please sign the CLA again or update the email address in the commit of this PR.
PS2: If you are a Googler, please sign the CLA as well to simplify the CLA verification process.

@memolog memolog added cla: no and removed cla: yes labels Mar 30, 2014
@memolog
Copy link
Contributor Author

memolog commented Mar 30, 2014

@mary-poppins I sent the CLA before push request, but my email address is my personal one, yutaka.yamaguchi@gmail.com which is not public address in github. So, I just submitted the CLA again, which email address is my public address, yyamaguchi@sixapart.com. I signed it as a personal.

Could you take a look? If you don't find my CLA, please let me know again.

thanks!

@memolog
Copy link
Contributor Author

memolog commented Mar 30, 2014

@caitp Your comment is including the great and interesting insight for me since I've never considered about the byte order. Actually, I didn't find the strong evidence that the browser always return the surrogate with the consistent order. So, your concern is clearly worth considering.

However, I guess this fix works fine in ordinary cases :)

@caitp
Copy link
Contributor

caitp commented Mar 30, 2014

so the last thing is, it might be worthwhile to generate a test suite which covers all of the supplementary characters, just to make sure there aren't any hard to find bugs. This could possibly make the test run take a good while longer, though, but I think it's probably worth it to prevent hard to find bugs. When you think about it, there aren't really all that many of them, so it probably wouldn't hurt too bad.

Unfortunately it's hard to write a complete test suite for this because Karma thinks the browser is timing out while generating those specs, testing locally. So maybe that's not so doable.

@mary-poppins
Copy link

CLA signature verified! Thank you!

Someone from the team will now triage your PR and it will be processed based on the determined priority (doc updates and fixes with tests are prioritized over other changes).

@memolog memolog added cla: yes and removed cla: no labels Mar 31, 2014
@justindujardin
Copy link

Any updates? I would really like to see this make its way into a release version of Angular.

I ran into this issue while trying to use the linky filter on arbitrary chat messages that contain emoji characters as well.

Here's the workaround filter I came up with to make Mac users happy again: http://jsfiddle.net/RvZ6W/

@caitp caitp added this to the 1.3.0-beta.8 milestone May 2, 2014
@caitp
Copy link
Contributor

caitp commented May 2, 2014

Sorry, I sorta forgot about this PR and it never was fully triaged... I brought it up in the dev chat, so we might be able to check it in, or it might have to wait a few weeks --- personally, I'd prefer if the test suite were more robust, but that doesn't seem to be feasible really, so it probably is ready for check-in

@btford
Copy link
Contributor

btford commented May 2, 2014

it never was fully triaged

I noticed this happens from time to time. When you comment on an issue, @caitp, can you also please triage it? Thanks!

@caitp
Copy link
Contributor

caitp commented May 2, 2014

It's one of the downsides of constant context-switching, sometimes some of these slip through =( But yeah I'm trying to do that

caitp pushed a commit that referenced this pull request May 2, 2014
The encodeEndities function encode non-alphanumeric characters to entities with charCodeAt.
charCodeAt does not return one value when their unicode codeponts is higher than 65,356.
It returns surrogate pair, and this is why the Emoji which has higher codepoints is garbled.
We need to handle them properly.

Closes #5088
Closes #6911
@caitp caitp closed this in 627b035 May 2, 2014
@caitp
Copy link
Contributor

caitp commented May 2, 2014

okay, well, hopefully it doesn't introduce any new problems, but as far as I'm aware it shouldn't... and if it does, it should be pretty low-impact for affecting supplemental codepoints. So we'll see

@btford
Copy link
Contributor

btford commented May 2, 2014

Yep, happens to me too. @_@

@justindujardin
Copy link

ALL the updates. Thanks! 👍

@memolog memolog deleted the address-surrogate-pair branch June 8, 2014 13:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ngSanitize cant escape all unicode characters
5 participants