-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Unicode issue: Not able to serialize properly some character like "式,进" #756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Well, try to debug it. What does the result of If you are able to fix the bug and submit a PR, we'd be happy to merge it. |
This is an incomplete repro and it's hard to help you. "I checked UnicodeToUTF8 and UTF8ToUnicode works fine." How do you know that? What do they produce? Why not include them? Repro should start and end with the utf8, so we aren't relying on functions we don't have access to. You also just said that your assert fails, but not how. That doesn't really help. Where's the body of 'parse'? We need exact stimulus and response involving only the jsoncpp library if possible. I tried to reproduce this and got the styledString:
If I specify the unicode directly in the C++, with either: const std::string uni = "\xe5\xbc\x8f" "\xef\xbc\x8c" "\xe8\xbf\x9b";
const std::string uni = u8"式,进"; The problem MUST be in the UnicodeToUTF8 and UTF8ToUnicode helpers that were omitted from the bug report. I think we have to close this. I am adding a test that I believe shows this not to be a problem, please modify the test if you review it and find that I've missed something. |
In the main.cpp, about const std::string uni = u8"式,进"; // "\u5f0f\uff0c\u8fdb" to const std::string uni = "\u5f0f\u77db\u839e"; // "式茅莞" and it works. |
When I build a json object containing UTF8 encoded value of string "式,进"
serialize this json object to string
parse the string again into another json object and read the value
Result: string is lost
here is sample pseudo code
I checked
UnicodeToUTF8
andUTF8ToUnicode
works fine.The only problem I see in
toStyledString()
API which make character encoding weird in outputThe text was updated successfully, but these errors were encountered: