Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to set error handler for decoding errors #1314

Merged
merged 8 commits into from
Oct 24, 2018
Merged

Conversation

nlohmann
Copy link
Owner

Fixes #1198

Proof of concept; currently only as parameter to the internal dump_escaped function; that is, not yet exposed to the dump function.
Test every prefix of Unicode sequences against the different dump functions.
@abolz
Copy link
Contributor

abolz commented Oct 23, 2018

Looks great!!

Out of curiosity, I have added some tests for the "correct" number of replacement characters (as in Unicode 11, Section 3.9 -- U+FFFD Substitution of Maximal Subparts).

All tests pass. Great work!

SECTION("U+FFFD Substitution of Maximal Subparts")
{
    // Some tests (mostly) from
    // https://www.unicode.org/versions/Unicode11.0.0/ch03.pdf
    // Section 3.9 -- U+FFFD Substitution of Maximal Subparts

    auto test = [&](std::string const& input, std::string const& expected) {
        json j = input;
        CHECK(j.dump(-1, ' ', true, json::error_handler_t::replace) == "\"" + expected + "\"");
    };

    test("\xC2", "\\ufffd");
    test("\xC2\x41\x42", "\\ufffd" "\x41" "\x42");
    test("\xC2\xF4", "\\ufffd" "\\ufffd");

    test("\xF0\x80\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xF1\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF2\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF3\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF4\x80\x80\x41", "\\ufffd" "\x41");
    test("\xF5\x80\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");

    test("\xF0\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF1\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF2\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF3\x90\x80\x41", "\\ufffd" "\x41");
    test("\xF4\x90\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xF5\x90\x80\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\x41");

    test("\xC0\xAF\xE0\x80\xBF\xF0\x81\x82\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xED\xA0\x80\xED\xBF\xBF\xED\xAF\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
    test("\xF4\x91\x92\x93\xFF\x41\x80\xBF\x42", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41" "\\ufffd""\\ufffd" "\x42");
    test("\xE1\x80\xE2\xF0\x91\x92\xF1\xBF\x41", "\\ufffd" "\\ufffd" "\\ufffd" "\\ufffd" "\x41");
}

@nlohmann
Copy link
Owner Author

@abolz Thanks for the tests, I shall add them to the test suite.

FYI: I used the library's Unicode test suite to systematically create a 7.5 million valid and invalid byte sequences and compared the dump outputs with those of Python. Good to know that I also covered those from the Unicode spec.

@coveralls
Copy link

coveralls commented Oct 23, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 87ef3f2 on feature/codec_errors into 9294e25 on develop.

Copy link

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from a typo. I must say however that I'm not really familiar mit the test macros used e.g. in the unit-unicode.cpp (in particular CAPTURE())

include/nlohmann/detail/output/serializer.hpp Outdated Show resolved Hide resolved
@nlohmann
Copy link
Owner Author

CAPTURE is a command from Catch to display the value of certain expressions in case of an error. In the context of the tests you can treat it as noop.

@nlohmann nlohmann merged commit 7b501de into develop Oct 24, 2018
@nlohmann nlohmann deleted the feature/codec_errors branch October 24, 2018 06:41
@nlohmann
Copy link
Owner Author

Thanks everyone!

nlohmann added a commit that referenced this pull request Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants