-
Notifications
You must be signed in to change notification settings - Fork 27.5k
fix(ngSanitize): encode surrogate pair properly #6911
Conversation
The encodeEndities function encode non-alphanumeric characters to entities with charCodeAt. charCodeAt does not return one value when their unicode codeponts is higher than 65,356. It returns surrogate pair, and this is why the Emoji which has higher codepoints is garbled. We need to handle them properly. Closes #5088
Thanks for the PR! Please check the items below to help us merge this faster. See the contributing docs for more information.
If you need to make changes to your pull request, you can update the commit with Thanks again for your help! |
I like this. But one thing (since I've only ever dealt with these encoding issues in low level languages), do we have to worry about the byte order of each surrogate, and potentially handle cases where it's swapped, or is the browser going to deal with that for us so that this is handled consistently. I assume that this is good enough, but admittedly I'm not actually positive about this! In any case, I think we can probably ship this on monday or tuesday. |
I'm sorry, but I wasn't able to verify your Contributor License Agreement (CLA) signature. CLA signature is required for any code contributions to AngularJS. Please sign our CLA and ensure that the CLA signature email address and the email address in this PR's commits match. If you signed the CLA as a corporation, please let us know the company's name. Thanks a bunch! PS: If you signed the CLA in the past then most likely the email addresses don't match. Please sign the CLA again or update the email address in the commit of this PR. |
@mary-poppins I sent the CLA before push request, but my email address is my personal one, yutaka.yamaguchi@gmail.com which is not public address in github. So, I just submitted the CLA again, which email address is my public address, yyamaguchi@sixapart.com. I signed it as a personal. Could you take a look? If you don't find my CLA, please let me know again. thanks! |
@caitp Your comment is including the great and interesting insight for me since I've never considered about the byte order. Actually, I didn't find the strong evidence that the browser always return the surrogate with the consistent order. So, your concern is clearly worth considering. However, I guess this fix works fine in ordinary cases :) |
so the last thing is, it might be worthwhile to generate a test suite which covers all of the supplementary characters, just to make sure there aren't any hard to find bugs. This could possibly make the test run take a good while longer, though, but I think it's probably worth it to prevent hard to find bugs. When you think about it, there aren't really all that many of them, so it probably wouldn't hurt too bad. Unfortunately it's hard to write a complete test suite for this because Karma thinks the browser is timing out while generating those specs, testing locally. So maybe that's not so doable. |
CLA signature verified! Thank you! Someone from the team will now triage your PR and it will be processed based on the determined priority (doc updates and fixes with tests are prioritized over other changes). |
Any updates? I would really like to see this make its way into a release version of Angular. I ran into this issue while trying to use the Here's the workaround filter I came up with to make Mac users happy again: http://jsfiddle.net/RvZ6W/ |
Sorry, I sorta forgot about this PR and it never was fully triaged... I brought it up in the dev chat, so we might be able to check it in, or it might have to wait a few weeks --- personally, I'd prefer if the test suite were more robust, but that doesn't seem to be feasible really, so it probably is ready for check-in |
I noticed this happens from time to time. When you comment on an issue, @caitp, can you also please triage it? Thanks! |
It's one of the downsides of constant context-switching, sometimes some of these slip through =( But yeah I'm trying to do that |
The encodeEndities function encode non-alphanumeric characters to entities with charCodeAt. charCodeAt does not return one value when their unicode codeponts is higher than 65,356. It returns surrogate pair, and this is why the Emoji which has higher codepoints is garbled. We need to handle them properly. Closes #5088 Closes #6911
okay, well, hopefully it doesn't introduce any new problems, but as far as I'm aware it shouldn't... and if it does, it should be pretty low-impact for affecting supplemental codepoints. So we'll see |
Yep, happens to me too. @_@ |
ALL the updates. Thanks! 👍 |
The encodeEndities function encode non-alphanumeric characters to entities with charCodeAt.
charCodeAt does not return one value when their unicode codeponts is higher than 65,356.
It returns surrogate pair, and this is why the Emoji which has higher codepoints is garbled.
We need to handle them properly.
Closes #5088