-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cssSelector
doesn't handle combining characters correctly
#1984
Comments
samshutchins
changed the title
Aug 4, 2023
cssSelector
doesn't handle combinding characters correctlycssSelector
doesn't handle combining characters correctly
Current jsoup: I don't think it's incorrect to emit it as a run of characters. And the selector does work in jsoup. We could improve to escape the combining form as a \u escape character, like Chrome is. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The example above uses combining characters to create an
é
. Emoji make heavy use of combining characters (👨👨👧👧 is made up of 11 characters:\uD83D\uDC68\u200D\uD83D\uDC68\u200D\uD83D\uDC67\u200D\uD83D\uDC67
).I have seen emoji used as css class names in the wild, and I think the character escaping code is doing the wrong thing when calling
cssSelector
, it looks like it's escaping every character individually, which breaks things with these combining characters.The text was updated successfully, but these errors were encountered: