-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode breaks xml serialization #1496
Comments
Hi, we are a student group and we would like to take a crack at this. Can't guarantee that we'll be able to complete it with high enough quality but we'll like to try. |
Hello! I think there is no error with document.outputSettings().charset("ASCII"); You can look for an online Unicode translator and try "\u226F\u0322\u0329\u032B\u0320\u0309\u030A", then you can see that it do translate it into "≯̢̩̫̠̉̊". By the way, unicode like "\u226F" has no correspoding ASCII character.
|
The parsed html is clearly weird and broken, but my assumption is that the output, after re-serializing it, should be valid.
document.outputSettings().charset("ASCII");
Version: 1.13.1
output:
The text was updated successfully, but these errors were encountered: