Skip to content

Unicode issue: Not able to serialize properly some character like "式,进" #756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
afrozm opened this issue Mar 20, 2018 · 3 comments
Closed

Comments

@afrozm
Copy link

afrozm commented Mar 20, 2018

When I build a json object containing UTF8 encoded value of string "式,进"
serialize this json object to string
parse the string again into another json object and read the value
Result: string is lost
here is sample pseudo code

Json::Value json;
json["abc"] = UnicodeToUTF8(L"式,进"); // wrapper over MultiByteToWideChar API
// serialize json to string
std::string jsonString = json.toStyledString();
// now parse again
Json::Value json2 = parse(jsonString); // parse using Json::CharReader
std::wstring unicodeString = UTF8ToUnicode(json2["abc"]); // wrapper over MultiByteToWideChar API
assert(unicodeString == L"式,进"); // fails

I checked UnicodeToUTF8 and UTF8ToUnicode works fine.
The only problem I see in toStyledString() API which make character encoding weird in output

@cdunn2001
Copy link
Contributor

Well, try to debug it. What does the result of toStyledString() look like? I think it should be "\u5f0f\uff0c\u8fdb".

If you are able to fix the bug and submit a PR, we'd be happy to merge it.

@BillyDonahue
Copy link
Contributor

BillyDonahue commented May 28, 2018

This is an incomplete repro and it's hard to help you.

"I checked UnicodeToUTF8 and UTF8ToUnicode works fine." How do you know that? What do they produce? Why not include them? Repro should start and end with the utf8, so we aren't relying on functions we don't have access to.

You also just said that your assert fails, but not how. That doesn't really help.

Where's the body of 'parse'?

We need exact stimulus and response involving only the jsoncpp library if possible.
There are too many moving parts in your example.

I tried to reproduce this and got the styledString:

styled: {
	"abc" : "\u5f0f\uff0c\u8fdb"
}

If I specify the unicode directly in the C++, with either:

const std::string uni = "\xe5\xbc\x8f" "\xef\xbc\x8c" "\xe8\xbf\x9b";
const std::string uni = u8"式,进";

The problem MUST be in the UnicodeToUTF8 and UTF8ToUnicode helpers that were omitted from the bug report. I think we have to close this. I am adding a test that I believe shows this not to be a problem, please modify the test if you review it and find that I've missed something.

BillyDonahue added a commit to BillyDonahue/jsoncpp that referenced this issue May 28, 2018
baylesj pushed a commit that referenced this issue Jun 25, 2019
res2k pushed a commit to res2k/jsoncpp that referenced this issue Aug 21, 2019
@SuperBlc
Copy link

SuperBlc commented Sep 17, 2019

In the main.cpp, about Line1651
I change

const std::string uni =  u8"式,进"; // "\u5f0f\uff0c\u8fdb"

to

const std::string uni = "\u5f0f\u77db\u839e"; // "式茅莞"

and it works.

dawesc pushed a commit to EFTlab/jsoncpp that referenced this issue Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants