Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-unique keys in objects. #375

Closed
TurpentineDistillery opened this issue Dec 3, 2016 · 20 comments
Closed

Non-unique keys in objects. #375

TurpentineDistillery opened this issue Dec 3, 2016 · 20 comments

Comments

@TurpentineDistillery
Copy link

JSON RFC allows (quite unfortunately, IMHO) non-unique keys in objects.
The current implementation only keeps the last one. I know this is mentioned in the doc in the example where json is constructed from a multimap. In this scenario if the user had put non-unique keys in there - well that's their fault.

In general case, however, when we're deserializing an unknown input and silently discarding duplicate keys, that doesn't seem like the most correct thing to do - this looks more like a silent failure. Perhaps this should throw instead?

@nlohmann
Copy link
Owner

nlohmann commented Dec 3, 2016

The standard says (emphasis added):

When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only.
Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.

The standard is open how the implementations should react. It explicitly lists two ways, and this library chose the first one. This is also mentioned in the documentation.

I don't think that throwing here makes much sense. This also would change the public API and may break more code than it helps.

@nlohmann
Copy link
Owner

nlohmann commented Dec 3, 2016

(Duplicate keys are also no an issue in https://github.com/miloyip/nativejson-benchmark and http://seriot.ch/parsing_json.php which cover a lot of JSON compliance issues)

@TurpentineDistillery
Copy link
Author

Sounds reasonable. In terms of documentation, the example with the multimap is indicative enough about how this library treats the duplicate keys, so this can be closed.

@intmainreturn00
Copy link

intmainreturn00 commented Dec 26, 2017

Still, it seems that unable to find out about such situation is strange. In many cases, it means logically that someone who makes such json for most cases missed something or got mistaken and such treatment leads to undefined behavior - which value gets parsed as a result. So maybe it would be a good idea to add some way to signalize about it? Maybe it somehow possible?

As far as I understand the library use std::map for underlying object representation, and probably for performance reason don't check before inserting new value..probably it can be some flag..or what is the correct way to handle such cases?

@nlohmann
Copy link
Owner

It is possible to pass a callback function to the parser which implements a check if a parsed key was already stored.

@intmainreturn00
Copy link

intmainreturn00 commented Dec 26, 2017

Thanks a lot for an answer on such short notice.
Could you please provide an example or specify where to find it with callback?

Possible related to parser_callback_t..

@intmainreturn00
Copy link

thanks!

@intmainreturn00
Copy link

intmainreturn00 commented Dec 26, 2017

What should I use from client code as a BasicJsonType& parsed argument to parse callback method?
Can't use this example like this:
json::parser_callback_t cb = [&](int d, json::parse_event_t event, json& parsed) -> bool { ; };

receive compile error Error:(166, 37) error: no viable conversion from '(lambda at /Users/owl/Desktop/SDKBackend/app/src/main/cpp/tests/ConfigTests.cpp:166:42)' to 'json::parser_callback_t' (aka 'function<bool (int, nlohmann::detail::parser<nlohmann::basic_json<std::map, std::vector, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, bool, long, unsigned long, double, std::allocator, adl_serializer> >::parse_event_t, nlohmann::basic_json<std::map, std::vector, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, bool, long, unsigned long, double, std::allocator, adl_serializer> &)>')

@nlohmann
Copy link
Owner

The third argument is a reference to the recently parsed JSON value (or object key).

@nlohmann
Copy link
Owner

I just tried to come up with an example, but I realized that we provide too little information to the callback function to properly keep track of the object keys. Here was my try:

json::parser_callback_t cb = [](int depth, json::parse_event_t event, json & parsed)
{
    static std::set<std::string> keys;
    
    if (event == json::parse_event_t::object_end)
    {
        keys.clear();
    }
    
    if (event == json::parse_event_t::key)
    {
        auto p = keys.insert(parsed.get<std::string>());
        if (not p.second)
        {
            std::cerr << "key " << parsed << " was already seen in this object!" << std::endl;
            return false;
        }
    }
    
    // yes, store parsed value
    return true;
};

Unfortunately, this function does not work with nested objects. I figured that we are missing a way to access the object we are adding to. :-/

@intmainreturn00
Copy link

Yes, I came up with almost the same code. In my json format I do have nested objects, but I know at what depth I need to check keys for uniqueness. But for any possible nested levels, it's currently seems impossible...

@nlohmann
Copy link
Owner

I always wanted to overwork the callback method anyway, so I will keep this use case in mind.

@jeremysingy
Copy link

Hi,

I think this behavior is broken since 85c7680.

I tried updating our codebase using 3.1.0 from 2.1.1 and one of our unit tests failed because using emplace instead of operator[] on map will not update an already existing object.

Thus either this commit should be reverted, or the documentation fixed to reflect the new behavior:

When the names within an object are not unique, later stored name/value pairs overwrite previously stored name/value pairs, leaving the used names unique. For instance, {"key": 1} and {"key": 2, "key": 1} will be treated as equal and both stored as {"key": 1}.

@nlohmann
Copy link
Owner

nlohmann commented Feb 7, 2018

@jeremysingy You are right. This subtlety has been overlooked. May I ask you to open a separate issue, so we can discuss this independently of this issue?

@jayeshbadwaik
Copy link

Would it be a possibility to at least have a build option (when building json) which allows to throw on duplicate keys? Either setting of a macro before the single include file, or passing an option to the cmake configuration?

@milasudril
Copy link

Well, libjansson has this option

JSON_REJECT_DUPLICATES

Issue a decoding error if any JSON object in the input text contains duplicate keys. Without this flag, the value of the last occurrence of each key ends up in the result. Key equivalence is checked byte-by-byte, without special Unicode comparison algorithms.

New in version 2.1.

I think it would be nice to have that option in nlohmann too, and with a similar API for simplicity.

@nlohmann
Copy link
Owner

nlohmann commented Feb 3, 2021

PRs welcome ;)

@nlohmann
Copy link
Owner

nlohmann commented Feb 3, 2021

In fact, this feature could probably be built with the current API using a parser callback function.

@jwestell
Copy link

jwestell commented Jun 1, 2021

In fact, this feature could probably be built with the current API using a parser callback function.

Is something like this what you had in mind? Or is there a better way to leverage what has already been parsed to check against existing keys?

  std::stack<std::set<json>> parse_stack;
  json::parser_callback_t check_duplicate_keys =
      [&](int /* depth */, json::parse_event_t event, json &parsed) {
        switch (event) {
          case json::parse_event_t::object_start: parse_stack.push(std::set<json>()); break;
          case json::parse_event_t::object_end: parse_stack.pop(); break;
          case json::parse_event_t::key:
            const auto result = parse_stack.top().insert(parsed);
            if (!result.second) {
              std::cerr << "key " << parsed << " was already seen in this object!" << std::endl;
              return false;
            }
            break;
          default: break;
        }

        // yes, store parsed value
        return true;
      };

Edit: this is very similar to your first attempt above, but should now take care of nested objects via the stack of sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants