Allow to set error handler for decoding errors #1314

nlohmann · 2018-10-23T12:38:38Z

Proof of concept; currently only as parameter to the internal dump_escaped function; that is, not yet exposed to the dump function.

Test every prefix of Unicode sequences against the different dump functions.

abolz · 2018-10-23T14:21:34Z

Looks great!!

Out of curiosity, I have added some tests for the "correct" number of replacement characters (as in Unicode 11, Section 3.9 -- U+FFFD Substitution of Maximal Subparts).

All tests pass. Great work!

SECTION("U+FFFD Substitution of Maximal Subparts")
{
    // Some tests (mostly) from
    // https://www.unicode.org/versions/Unicode11.0.0/ch03.pdf
    // Section 3.9 -- U+FFFD Substitution of Maximal Subparts

    auto test = [&](std::string const& input, std::string const& expected) {
        json j = input;
        CHECK(j.dump(-1, ' ', true, json::error_handler_t::replace) == "\"" + expected + "\"");
    };

    test("\xC2", "\\ufffd");
    test("\xC2\x41\x42", "\\ufffd" "\x41" "\x42");
    test("\xC2\xF4", "\\ufffd" "\\ufffd");

    test("\xF0\x80\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xF1\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF2\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF3\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF4\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF5\x80\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");

    test("\xF0\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF1\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF2\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF3\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF4\x90\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xF5\x90\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");

    test("\xC0\xAF\xE0\x80\xBF\xF0\x81\x82\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xED\xA0\x80\xED\xBF\xBF\xED\xAF\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xF4\x91\x92\x93\xFF\x41\x80\xBF\x42", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41" "\\ufffd""\\ufffd" "\x42");
    test("\xE1\x80\xE2\xF0\x91\x92\xF1\xBF\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
}

nlohmann · 2018-10-23T15:20:04Z

@abolz Thanks for the tests, I shall add them to the test suite.

FYI: I used the library's Unicode test suite to systematically create a 7.5 million valid and invalid byte sequences and compared the dump outputs with those of Python. Good to know that I also covered those from the Unicode spec.

@abolz

Thanks @abolz!

coveralls · 2018-10-23T15:49:02Z

Coverage remained the same at 100.0% when pulling 87ef3f2 on feature/codec_errors into 9294e25 on develop.

niklas88

LGTM apart from a typo. I must say however that I'm not really familiar mit the test macros used e.g. in the unit-unicode.cpp (in particular CAPTURE())

include/nlohmann/detail/output/serializer.hpp

nlohmann · 2018-10-23T17:26:24Z

CAPTURE is a command from Catch to display the value of certain expressions in case of an error. In the context of the tests you can treat it as noop.

nlohmann · 2018-10-24T06:41:08Z

Thanks everyone!

nlohmann added 6 commits October 16, 2018 20:38

🚧 proposal for different error handlers #1198

0671e92

Proof of concept; currently only as parameter to the internal dump_escaped function; that is, not yet exposed to the dump function.

🚧 overworked error handlers #1198

c5821d9

💚 added tests #1198

e5dce64

Test every prefix of Unicode sequences against the different dump functions.

🚧 respect ensure_ascii parameter #1198

c7af027

🚧 fixed an issue with ensure_ascii #1198

c51b1e6

🚧 fixed test cases #1198

951a7a6

nlohmann added the release item: ✨ new feature label Oct 23, 2018

nlohmann added this to the Release 3.3.1 milestone Oct 23, 2018

nlohmann self-assigned this Oct 23, 2018

nlohmann mentioned this pull request Oct 23, 2018

Soften the landing when dumping non-UTF8 strings (type_error.316 exception) #1198

Closed

💚 additional tests from the Unicode spec #1198

2343d9c

Thanks @abolz!

niklas88 approved these changes Oct 23, 2018

View reviewed changes

include/nlohmann/detail/output/serializer.hpp Outdated Show resolved Hide resolved

✏️ fixed a typo #1314

87ef3f2

nlohmann merged commit 7b501de into develop Oct 24, 2018

nlohmann deleted the feature/codec_errors branch October 24, 2018 06:41

nlohmann added a commit that referenced this pull request Oct 24, 2018

📝 updated documentation #1314

f102df3

nlohmann mentioned this pull request Oct 26, 2018

WIP/RFC: Flexible pretty printing formatting #1121

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to set error handler for decoding errors #1314

Allow to set error handler for decoding errors #1314

nlohmann commented Oct 23, 2018

abolz commented Oct 23, 2018

nlohmann commented Oct 23, 2018

coveralls commented Oct 23, 2018 •

edited

Loading

niklas88 left a comment

nlohmann commented Oct 23, 2018

nlohmann commented Oct 24, 2018

Allow to set error handler for decoding errors #1314

Allow to set error handler for decoding errors #1314

Conversation

nlohmann commented Oct 23, 2018

abolz commented Oct 23, 2018

nlohmann commented Oct 23, 2018

coveralls commented Oct 23, 2018 • edited Loading

niklas88 left a comment

Choose a reason for hiding this comment

nlohmann commented Oct 23, 2018

nlohmann commented Oct 24, 2018

coveralls commented Oct 23, 2018 •

edited

Loading