-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong handling of UTF-8 text in the XML module #13703
Comments
I can reproduce this behaviour with libxml2 2.11.4. But 2.10.4 is still correct.
Maybe that broke something? This seems to appear only with |
I have filed an upstream issue: https://gitlab.gnome.org/GNOME/libxml2/-/issues/570 |
According to upstream maintainer this is caused by bug fix for |
Thanks for the prompt resolution. I am confident the issue is solved. |
I am experiencing issues when parsing HTML containing non-ASCII characters. For example:
The expected output is
České psaní
, but I getÄeské psanÃ
, which seems as if the correct UTF-8 bytes were re-interpreted as ISO-8859-1 (or similar) encoding and converted into UTF-8 again.The issue appeared all of sudden in a binary which had previously worked correctly, i.e. without recompilation of my program. Therefore, I suspect an external library is to blame, perhaps
libxml2
, but I'm posting here because I am not sure, and also because I do not know how to testlibxml2
directly.Can anybody confirm the issue? Is this a Crystal issue or not?
Thanks in advance.
My specs are:
Operating system: Manjaro (Linux 5.15.120-1)
Crystal version:
libxml2
version: 2.11.4-1The text was updated successfully, but these errors were encountered: