Duplicate keys #531

cydparser · 2017-03-27T23:04:28Z

The behavior of parsing duplicate keys has changed from last key wins to first key wins. I do not see any mention of this change in the changelog -- was it intentional? Is there a way to force the previous behavior?

E.g. With aeson-0.11.2.1:

$ stack --resolver lts-7.7 ghci --package aeson
:set -XOverloadedStrings
import Data.Aeson
decode "{\"a\": false, \"a\": true}" :: Maybe Object
# Just (fromList [("a",Bool True)])

Version 1.0.2.1 results in Just (fromList [("a",Bool False)]).

The text was updated successfully, but these errors were encountered:

Lysxia · 2017-03-28T00:23:54Z

This comes from here (PR #452) and looks unintentional. The key/value pairs are indeed accumulated in reverse order compared to before.

bergmark · 2017-03-28T13:19:13Z

Thank you for your report. It definitely doesn't look intentional, thanks for pinpointing it @Lysxia! So I don't think there is a workaround, unfortunately.

I'd be inclined to say that you should not depend on this behavior as it's not in the JSON spec. The fact that we allowed this to begin with might have just been an accidental result caused by how HashMap.insert works. But if people feel strongly about it I wouldn't mind switching back to the old behavior, or will others come along and want the opposite?

@winterland1989 do you remember if there was a reason this changed? Performance?

winterland1989 · 2017-03-28T15:34:33Z

Yep, fromList use mutation behind scene to make HAMT building a lot faster. But I haven't noticed that the inserting results differ. I have to dig into unordered-containers code to see if we have a workaround. Maybe we should discuss if we want to provide any guarantee on this first.

cydparser · 2017-03-28T18:06:18Z

Thanks for the responses. IMHO, it would be convenient if aeson followed ECMAScript's specification of JSON.parse where the JSON spec is partially defined:

In the case where there are duplicate name Strings within an object, lexically preceding
values for the same key shall be overwritten.

At the very least, the current behavior should be considered part of the API. Thanks for your consideration and for all of your hard work on this package.

winterland1989 · 2017-03-30T02:43:57Z

OK, if we choose to follow that convention, i believe we should just use stack space to build the list directly(which may bring some performance improvement):

objectValues :: Parser Text -> Parser Value -> Parser (H.HashMap Text Value)
objectValues str val = do
  skipSpace
  w <- A.peekWord8'
  if w == CLOSE_CURLY
    then A.anyWord8 >> return H.empty
    else loop >>= H.fromList
 where
  loop = do
    k <- str <* skipSpace <* char ':'
    v <- val <* skipSpace
    ch <- A.satisfy $ \w -> w == COMMA || w == CLOSE_CURLY
    if ch == COMMA
      then do
        skipSpace 
        kvs <- loop
        return ((k,v) : kvs)
      else return []

Lysxia · 2017-03-30T11:21:44Z

@winterland1989 Have you considered only replacing H.fromList with H.fromListWith const, keeping the current structure of loop?

winterland1989 · 2017-03-30T13:34:16Z

Good idea! I suppose that would work too, also it would keep the constant stack behavior.

yav · 2017-04-04T18:38:08Z

Since Aeson uses a HashMap to represent objects, I think it should fail to parse JSON that has duplicate keys, rather than silently ignoring some of the input.

With the current behavior (no matter if it picks the first or last occurrence) there is no way for the application to know that part of the input was ignored, and take action (e.g., warn the user, or reject the input as invalid).

bergmark · 2017-04-28T12:21:22Z

I was going to argue that allowing duplicates was in line with the "tolerant reader" approach aeson uses otherwise such as ignoring extraneous fields, but I wasn't able to turn this into a proper argument :)

I don't think the current internal representation should influence this decision. But the alternative would be to change it to HashMap Text [Value], I don't think I want to go down that road.

@cydparser would turning duplicate keys into an error work in your case or do you actually need this behavior? The purist in me says that this should indeed be a parse error.

cydparser · 2017-04-28T23:54:39Z

@bergmark I needed the treatment of duplicate keys to match ECMAScript -- throwing an error would not have helped. I am validating a JSON configuration file that is preprocessed (custom template rules and string interpolation) and parsed by Node.js. It is the Node.js app that ultimately relies on the behavior of duplicate keys, so the Haskell code must allow them too. I have a workaround in place.

I would argue that most uses of JSON involve parsing with ECMAscript at some point, and that deviating from its behavior too much violates the principle of least surprise. It is reasonable to expect that parsing JSON directly with Haskell and parsing JSON posted from a web service that does nothing but JSON.stringify(JSON.parse(s)) would result in an equivalent value or error.

yav · 2017-04-29T00:56:32Z

We could parametrize the behavior of the parser, e.g., using something like:

   data DuplicateKeys = RejectDuplicates | UseLaterEntry

hvr · 2017-05-21T21:48:04Z

I'd just like to point out that RFC7159 (which I consider the most accurate specification of JSON currently), has to say the following about duplicates:

The names within an object SHOULD be unique.

An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.

JSON parsing libraries have been observed to differ as to whether or
not they make the ordering of object members visible to calling
software. Implementations whose behavior does not depend on member
ordering will be interoperable in the sense that they will not be
affected by these differences.

llelf · 2018-08-02T17:32:23Z

It's rfc8259 now, but the message still holds. OTOH the ability to know there were duplicates (or, a stricter parser) would be welcome too I think.

nalchevanidze · 2019-06-21T18:23:45Z

hello @bergmark,

i use Aeson for variable parsing in my GraphQL library and want to report an error if variable names are not unique. it would be great if Aeson could give me a way to do this.

the Proposal from @yav may could help me.

   data DuplicateKeys = RejectDuplicates | UseLaterEntry

- This allows explicit handling of duplicate keys. - Also remove an outdated comment about the difference between the parsers value and json (they have been the same for a while).

bergmark · 2019-06-25T17:40:58Z

@Lysxia kindly added configurable support for this. Please try out the 1.4.4.0 release.

bergmark added the help wanted label Apr 2, 2017

bergmark added needs decision and removed help wanted labels Apr 28, 2017

lucab mentioned this issue May 6, 2017

de: ensure unique keys when deserializing to maps and sets serde-rs/serde#916

Closed

phadej mentioned this issue May 11, 2017

Should we make a clear value-level distinction between floating and integral numbers? #546

Open

Lysxia mentioned this issue Jun 22, 2019

Add handling of duplicate keys #714

Merged

bergmark closed this as completed in #714 Jun 23, 2019

byorgey mentioned this issue Apr 23, 2023

Validate uniqueness of entity names in CI swarm-game/swarm#1221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate keys #531

Duplicate keys #531

cydparser commented Mar 27, 2017

Lysxia commented Mar 28, 2017 •

edited

Loading

bergmark commented Mar 28, 2017

winterland1989 commented Mar 28, 2017

cydparser commented Mar 28, 2017

winterland1989 commented Mar 30, 2017

Lysxia commented Mar 30, 2017

winterland1989 commented Mar 30, 2017

yav commented Apr 4, 2017

bergmark commented Apr 28, 2017

cydparser commented Apr 28, 2017

yav commented Apr 29, 2017

hvr commented May 21, 2017

llelf commented Aug 2, 2018

nalchevanidze commented Jun 21, 2019 •

edited

Loading

bergmark commented Jun 25, 2019

Duplicate keys #531

Duplicate keys #531

Comments

cydparser commented Mar 27, 2017

Lysxia commented Mar 28, 2017 • edited Loading

bergmark commented Mar 28, 2017

winterland1989 commented Mar 28, 2017

cydparser commented Mar 28, 2017

winterland1989 commented Mar 30, 2017

Lysxia commented Mar 30, 2017

winterland1989 commented Mar 30, 2017

yav commented Apr 4, 2017

bergmark commented Apr 28, 2017

cydparser commented Apr 28, 2017

yav commented Apr 29, 2017

hvr commented May 21, 2017

llelf commented Aug 2, 2018

nalchevanidze commented Jun 21, 2019 • edited Loading

bergmark commented Jun 25, 2019

Lysxia commented Mar 28, 2017 •

edited

Loading

nalchevanidze commented Jun 21, 2019 •

edited

Loading