Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate keys #531

Closed
cydparser opened this issue Mar 27, 2017 · 15 comments · Fixed by #714
Closed

Duplicate keys #531

cydparser opened this issue Mar 27, 2017 · 15 comments · Fixed by #714

Comments

@cydparser
Copy link

The behavior of parsing duplicate keys has changed from last key wins to first key wins. I do not see any mention of this change in the changelog -- was it intentional? Is there a way to force the previous behavior?

E.g. With aeson-0.11.2.1:

$ stack --resolver lts-7.7 ghci --package aeson
:set -XOverloadedStrings
import Data.Aeson
decode "{\"a\": false, \"a\": true}" :: Maybe Object
# Just (fromList [("a",Bool True)])

Version 1.0.2.1 results in Just (fromList [("a",Bool False)]).

@Lysxia
Copy link
Collaborator

Lysxia commented Mar 28, 2017

This comes from here (PR #452) and looks unintentional. The key/value pairs are indeed accumulated in reverse order compared to before.

@bergmark
Copy link
Collaborator

Thank you for your report. It definitely doesn't look intentional, thanks for pinpointing it @Lysxia! So I don't think there is a workaround, unfortunately.

I'd be inclined to say that you should not depend on this behavior as it's not in the JSON spec. The fact that we allowed this to begin with might have just been an accidental result caused by how HashMap.insert works. But if people feel strongly about it I wouldn't mind switching back to the old behavior, or will others come along and want the opposite?

@winterland1989 do you remember if there was a reason this changed? Performance?

@winterland1989
Copy link
Contributor

Yep, fromList use mutation behind scene to make HAMT building a lot faster. But I haven't noticed that the inserting results differ. I have to dig into unordered-containers code to see if we have a workaround. Maybe we should discuss if we want to provide any guarantee on this first.

@cydparser
Copy link
Author

Thanks for the responses. IMHO, it would be convenient if aeson followed ECMAScript's specification of JSON.parse where the JSON spec is partially defined:

In the case where there are duplicate name Strings within an object, lexically preceding
values for the same key shall be overwritten.

At the very least, the current behavior should be considered part of the API. Thanks for your consideration and for all of your hard work on this package.

@winterland1989
Copy link
Contributor

OK, if we choose to follow that convention, i believe we should just use stack space to build the list directly(which may bring some performance improvement):

objectValues :: Parser Text -> Parser Value -> Parser (H.HashMap Text Value)
objectValues str val = do
  skipSpace
  w <- A.peekWord8'
  if w == CLOSE_CURLY
    then A.anyWord8 >> return H.empty
    else loop >>= H.fromList
 where
  loop = do
    k <- str <* skipSpace <* char ':'
    v <- val <* skipSpace
    ch <- A.satisfy $ \w -> w == COMMA || w == CLOSE_CURLY
    if ch == COMMA
      then do
        skipSpace 
        kvs <- loop
        return ((k,v) : kvs)
      else return []

@Lysxia
Copy link
Collaborator

Lysxia commented Mar 30, 2017

@winterland1989 Have you considered only replacing H.fromList with H.fromListWith const, keeping the current structure of loop?

@winterland1989
Copy link
Contributor

Good idea! I suppose that would work too, also it would keep the constant stack behavior.

@yav
Copy link

yav commented Apr 4, 2017

Since Aeson uses a HashMap to represent objects, I think it should fail to parse JSON that has duplicate keys, rather than silently ignoring some of the input.

With the current behavior (no matter if it picks the first or last occurrence) there is no way for the application to know that part of the input was ignored, and take action (e.g., warn the user, or reject the input as invalid).

@bergmark
Copy link
Collaborator

I was going to argue that allowing duplicates was in line with the "tolerant reader" approach aeson uses otherwise such as ignoring extraneous fields, but I wasn't able to turn this into a proper argument :)

I don't think the current internal representation should influence this decision. But the alternative would be to change it to HashMap Text [Value], I don't think I want to go down that road.

@cydparser would turning duplicate keys into an error work in your case or do you actually need this behavior? The purist in me says that this should indeed be a parse error.

@cydparser
Copy link
Author

@bergmark I needed the treatment of duplicate keys to match ECMAScript -- throwing an error would not have helped. I am validating a JSON configuration file that is preprocessed (custom template rules and string interpolation) and parsed by Node.js. It is the Node.js app that ultimately relies on the behavior of duplicate keys, so the Haskell code must allow them too. I have a workaround in place.

I would argue that most uses of JSON involve parsing with ECMAscript at some point, and that deviating from its behavior too much violates the principle of least surprise. It is reasonable to expect that parsing JSON directly with Haskell and parsing JSON posted from a web service that does nothing but JSON.stringify(JSON.parse(s)) would result in an equivalent value or error.

@yav
Copy link

yav commented Apr 29, 2017

We could parametrize the behavior of the parser, e.g., using something like:

   data DuplicateKeys = RejectDuplicates | UseLaterEntry

@hvr
Copy link
Member

hvr commented May 21, 2017

I'd just like to point out that RFC7159 (which I consider the most accurate specification of JSON currently), has to say the following about duplicates:

The names within an object SHOULD be unique.

An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.

JSON parsing libraries have been observed to differ as to whether or
not they make the ordering of object members visible to calling
software. Implementations whose behavior does not depend on member
ordering will be interoperable in the sense that they will not be
affected by these differences.

@llelf
Copy link
Member

llelf commented Aug 2, 2018

It's rfc8259 now, but the message still holds. OTOH the ability to know there were duplicates (or, a stricter parser) would be welcome too I think.

@nalchevanidze
Copy link

nalchevanidze commented Jun 21, 2019

hello @bergmark,

i use Aeson for variable parsing in my GraphQL library and want to report an error if variable names are not unique. it would be great if Aeson could give me a way to do this.

the Proposal from @yav may could help me.

   data DuplicateKeys = RejectDuplicates | UseLaterEntry

Lysxia added a commit to Lysxia/aeson that referenced this issue Jun 22, 2019
- This allows explicit handling of duplicate keys.
- Also remove an outdated comment about the difference between the
  parsers value and json (they have been the same for a while).
@bergmark
Copy link
Collaborator

@Lysxia kindly added configurable support for this. Please try out the 1.4.4.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants