-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues parsing a previously encoded binary (non-UTF8) string. #1211
Comments
The JSON text in your initial program needs to be properly escaped. Here is a version with a raw string literal that works: #include <string>
#include "json.hpp"
using json = nlohmann::json;
int main(int argc, const char **argv) {
std::string encoded(R"("\u0006\u00e4\u00b4\u00d8\u008d")");
auto j3 = json::parse(encoded);
} |
I did not know that. Unfortunately, there still remains an issue with your library - though fortunately, the solution is clearer now.
More details in original post about how other languages handle this. I have a proposed solution, not sure if I will be able to whip up a PR for it - I haven't checked what you use for unicode translation. But essentially, there seem two open paths. JavaScript (well, V8 anyway) passes through the invalid UTF-8 binary sequences as-is, or python escapes at a byte level when the UTF-8 encoding is not valid. Your library will decode the python version, but not the JavaScript version, which can be tested thusly:
This time the
|
Related: #1198 |
Quite. As someone who has always treated JSON as alternative to I'll take a look at the code now. |
@nlohmann This is quite interesting. A small alternation to my original test code, to properly encapsulate the string inside a key/value object in JSON format, and the library seems to be able to decode, re-encode, and dump, without apparent issue. Is this all just a storm in a tea-cup? Do you already have the functionality in place to properly handle binary data, and it simply doesn't come into play when using the library for simple string encoding?
#include <string>
#include "json320.hpp"
#define JSON_TRY(...) \
try { \
__VA_ARGS__; \
} \
catch (json::exception e) \
{ \
fprintf(stderr, "%s: JSON Exception: %s", __FUNCTION__, e.what()); \
return 1; \
}
using json = nlohmann::json;
int main(int argc, const char **argv) {
json j;
std::string encoded1(R"({"test":"\u0006\u00e4\u00b4\u00d8\u008d\""})");
printf("encoded string: %s\n", encoded1.c_str()); fflush(stdout);
JSON_TRY(
j = json::parse(encoded1);
printf("decoding complete.\n"); fflush(stdout);
)
JSON_TRY(
printf("dumping: %s\n", j.dump().c_str()); fflush(stdout);
)
std::string encoded;
JSON_TRY(
encoded = j.dump();
printf("encoding complete.\n"); fflush(stdout);
)
JSON_TRY(
printf("dumping: %s\n", j.dump().c_str()); fflush(stdout);
)
json j2;
JSON_TRY(
j2 = json::parse(encoded);
printf("re-encoded complete.\n"); fflush(stdout);
)
JSON_TRY(
printf("dumping: %s\n", j2.dump().c_str()); fflush(stdout);
)
} |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is a bit of an edge case, but it seems to be handled in other platforms (to varying degrees).
The simplest explanation I can give you is just to show you this code:
Which produced this exception:
Note: the same error occurs when nlohmann has done the previous encoding of the string (as shown below).
Whilst I fully understand the reason for the error, I don't believe that it is valid to assume all strings will be valid UTF-8, merely because the JSON spec includes
\u0000
unicode escape sequences. I might be wrong, I haven't read the RFC lately.Ignore everything below this line, as it's just more details. :)
Expanded tests
Some tests with other platforms showed that
php
will not encode a raw non-UTF8 binary string,JavaScript (v8)
had no issues, andpython
actually produced the "every character escaped" version I used in the initial example.PHP test:
The text was updated successfully, but these errors were encountered: