Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

order in dict is not preserved #110

Closed
MazokuMaxy opened this issue Dec 12, 2017 · 34 comments
Closed

order in dict is not preserved #110

MazokuMaxy opened this issue Dec 12, 2017 · 34 comments

Comments

@MazokuMaxy
Copy link

MazokuMaxy commented Dec 12, 2017

Python 3.6.3
import yaml
document = """
b:
c: 3
d: 4
a: 1
"""
print(yaml.dump(yaml.load(document), default_flow_style=False))
a: 1
b:
c: 3
d: 4

@sigmavirus24
Copy link
Contributor

This is a property of Python, not PyYAML. Python does not preserve the order of dictionaries and so we cannot either. To do so, you'd have to yaml.load into an OrderedDict (a dictionary implementation that preserves order). Cheers!

@shoogle
Copy link

shoogle commented Mar 12, 2018

@sigmavirus24, that's not completely true.

import yaml
document = """
b:
c: 3
d: 4
a: 1
"""
dictionary = yaml.safe_load(document)

print(dictionary)
print(yaml.dump(dictionary))
{'b': None, 'c': 3, 'd': 4, 'a': 1}
a: 1
b: null
c: 3
d: 4

So it would seem that Python does preserve the order of the dictionary. In fact, the sorting is done by PyYAML during yaml.dump(). This line in representer.py appears to be the culprit.

A whole bunch of people have created forks/extensions to PyYAML specifically to get around this issue, so it would be nice if it was fixed in PyYAML itself.

Source: https://stackoverflow.com/a/45984742

@perlpunk
Copy link
Member

perlpunk commented Mar 12, 2018

Seems python does only guarantee to keep the insertion order of dicts since 3.7:
https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6
I think a sort option (default=true) would make sense. Setting it to false would then keep the original order (with python >= 3.7).

@perlpunk perlpunk reopened this Mar 12, 2018
@shoogle
Copy link

shoogle commented Mar 12, 2018

An option to not sort would be great, thanks, and fine for my intended use case, but if Python doesn't sort by default then perhaps PyYAML shouldn't sort by default either?

@perlpunk
Copy link
Member

@shoogle I guess both defaults can make sense. Important for me would be backwards compatibility.

Pull requests are welcome. I don't know when I can implement that, I'm busy for a couple of weeks, and additionally I only just started to learn Python ;-)
It shouldn't be that hard, but the classes and objects in dumper/representer.py are confusing me right now.

@perlpunk
Copy link
Member

I created a PR #143
Maybe @sigmavirus24 or @ingydotnet can have a look at it

@jonas-eschle
Copy link

jonas-eschle commented May 21, 2018

I think a fundamental problem is that the yaml-specs do not guarantee an order. As PyYAML is a yaml parser, guaranteeing order seems like a slight breach of the yaml specs.

That's why there is Phynix/yamlloader, which is based on PyYAML but extends the functionality by explicitly keeping the order or OrderedDicts (and dicts for Python 3.7+). Though I wanna stress out, that this actually breaks the yaml specifications! But it is still useful...

My proposition would be not to guarantee that behavior directly in pyyaml and rely on extensions like yamlloader. Or, in other words, how far should pyyaml deviate from the yaml-specs?

Any thoughts on that?

(This of course is not a vote against the ordered flag, just against giving the guarantee here)

@wimglenn
Copy link

wimglenn commented May 24, 2018

I wrote a drop-in replacement to address this problem: https://github.com/wimglenn/oyaml
You may import oyaml as yaml and use as usual.

@shoogle
Copy link

shoogle commented May 26, 2018

the yaml-specs do not guarantee an order.

Correct.

As PyYAML is a yaml parser, guaranteeing order seems like a slight breach of the yaml specs.

Wrong. Since the spec doesn't guarantee an order, that means any order is valid. PyYAML could return dict keys in any arbitrary order (alphabetical, reverse-alphabetical, shortest first, random, order of creation, etc.) and it would still be perfectly consistent with the YAML specification.

In practice, the only ordering that makes any sense is the order in which they were created, because if they are returned in a different order then the information about which was created first is lost forever. If the user requires any other form of ordering (alphabetical, etc.), then he/she is able to sort the dict themself after it has been returned in creation order. However, if the dict is not returned in creation order then the user can never put it back in creation order (except by a lucky guess).

It is for this very reason that, since Python 3.7, dictionaries are ordered by default as a feature of the language (and not just as an implementation detail as they were in 3.6).

This is why I think returning in creation order should be the default in PyYAML (at least for Python >= 3.7) and not just an option, though I understand the desire to ensure backwards compatibility. (It should be noted, however, that nobody complained when Python dicts became ordered by default, even though it could be seen as a backwards-incompatible change.)

@perlpunk
Copy link
Member

I agree with @shoogle that, while the Spec does not guarantee order, it's not a requirement to return keys in random order.

Regarding backwards compatibility, people might rely on the current behaviour that keys are sorted.
That's a bit different to the change in python 3.6/3.7, where the keys were in random order previously.
Changing the sort_keys option in my PR #143 to have a different default depending on the python version might be a good compromise, OTOH it could also be confusing.

@jonas-eschle
Copy link

jonas-eschle commented May 27, 2018

@shoogle you are right, I've formulated things wrong: guaranteeing an order does not break the yaml specs of course, but extends them. And while there is definitely more use in returning insertion-ordered dicts, the question is whether this should be guaranteed inside the basic converter PyYAML or be "sold as an extension in an extension".

I think the question really boils down to the "problem" of backwards compatibility: I could not find it, but does PyYAML guarantee somewhere that the dumping will be sorted?

Otherwise: Python never guaranteed an order/sorting of the dicts (up to 3.7) and neither does yaml (or PyYAML, if the sorting was not a guaranteed feature). So no one actually could have relied on any kind of sorting.

I guess: who really relied on the insertion-order or any other kind of ordering used OrderedDict together with one of the extensions that explicitly deal with this problem (like yamlloader) which try to keep up their compatibility differently (but that's also a different use case). If, as a user, you are concerned about backwards compatibility now (say you write 3.7, use the dict and rely on it's order), you anyway have to use OrderedDict if you're concerned about backwards compatibility (and therefore again rely on extensions).

So: yes, I am in favor of it, let's extend the yaml specs in order to stick closely to the python specs and guarantee the order preserving behavior in PyYAML for Python 3.7+. For < 3.7, the order should not matter (assuming it was never guaranteed to be sorted).

@shoogle
Copy link

shoogle commented May 27, 2018

@mayou36, if insertion-ordering is optional, as it is in PR #143, then there is no problem with backwards compatibility. Furthermore, as you say, PyYAML made no guarantees about ordering anyway, so it would not be breaking the API to change to insertion-ordering by default. I'm not saying it should happen right away, but maybe after one or two releases where it was provided as an option.

@jonas-eschle
Copy link

@shoogle I fully agree, insertion-order could even be the default and sorting has to be set.

What I meant was: if you want to write 3.x (and not just 3.7 (3.6) + ) compatible code relying on any kind of dict ordering, OrderedDict have to be used any.

@rasmusagren
Copy link

@shoogle I also agree. Over time I think the path of least resistance would be to adhere as closely as possible to how Python does it. That would mean ordered by creation by default for Python >=3.6.

@jonas-eschle
Copy link

I would actually stick to > 3.6, not >=. It is mentioned as an implementation detail in CPython and not as a language feature. It doesn't matter a lot for someone if it is not yet available in 3.6, but if someone uses it with 3.6 in an implementation where the insertion order is not kept (although being a rare case probably), I think this is the bigger issue.

@ingydotnet
Copy link
Member

YAML dumpers can (and probably should) dump mappings with their keys sorted (by default) in environments where insertion order is not preserved. PyYAML sorts keys doesn't have an option not to. Having keys in a deterministic order is generally more useful than not.

The most correct and useful thing to do here is to provide a sort_keys option to dump that defaults to True. Setting it to False will get you whatever key order the python implementation being used provides natively.

python -c 'import yaml; print yaml.dump({"c": 3, "b": 2, "a": 1}, default_flow_style=False, sort_keys=False)'

It looks like @perlpunk++'s #143 does this. I'll try to get it released soon.

@kaidokert
Copy link

I made a different thing for myself:

        if hasattr(mapping, 'items'):
            mapping = list(mapping.items())
            try:
                if hasattr(self, 'dict_sortkey'):
                    mapping = sorted(mapping, key=self.dict_sortkey)
                else:
                    mapping = sorted(mapping)
            except TypeError as e:
                pass

calling with

    dmp = Dumper
    dmp.dict_sortkey = lambda self, y: sort_function(y)
    dumped = yaml.dump(dats, allow_unicode=True, Dumper=dmp)

Admittedly ugly, but allows whatever sort order is desired

@TimothyBramlett
Copy link

@wimglenn Thanks! Worked great!

@NoahCardoza
Copy link

If it is any consolation, Python's JSON module preserves order when dumping. Since oyaml now exists, it's not that big of an issue but I just thought I'd throw it out there.

@jasweet
Copy link

jasweet commented Aug 15, 2018

I battled this last year and 'solved' it as indicated above.

So, a very hackish 'solution' is to just comment out the try block in representer.py:

try:
    mapping = sorted(mapping)
except TypeError:
    pass

This should be a feature of yaml.dump, same as json.dumps(foo, sort_keys=False)

@feluxe
Copy link

feluxe commented Sep 30, 2018

The most correct and useful thing to do here is to provide a sort_keys option to dump that defaults to True.

sort_keys = True is nothing I would expect from a dumper by default, especially not from a YAML dumper. A dumper should convert data from one format into another, there is no reason for it to sort the data. YAML is a human readable format and used for user configs a lot. If you run a key sort on a user config you end up with a mess. It already happened to me a couple of times working with python/pyyaml. It's annoying and it shouldn't be. Just my 2 cents. :)

@perlpunk
Copy link
Member

@feluxe if the only alternative is a random key order (like for python < 3.7 and many other languages), sorted keys sounds like a pretty useful default ;-)

@feluxe
Copy link

feluxe commented Sep 30, 2018

@perlpunk Before 3.7 you could use OrderedDict.

@jasweet
Copy link

jasweet commented Sep 30, 2018

Agree with both of you. Like I mentioned above, json.dumps uses sort keys and defaults to true but can be set to false. That functionality should be added to PyYaml for sure. Simply need a parameter based conditional around the try block I posted above.

@gatopeich
Copy link

@perlpunk, the original order is not "random" order, please stop caling it that.
The only random thing here was the random idea of applying a dictionary sort when the spec does not require it.

@stefanoborini
Copy link

stefanoborini commented Nov 20, 2018

@perlpunk Just as a quip and curiosity note, the order is not random. It's "arbitrary", which means it's consistent but not to be relied upon before 3.7. You can however make it truly random by either specifying PYTHONHASHSEED=random or -R when you invoke python.

@gatopeich
Copy link

json.dumps uses sort keys and defaults to true

@jasweet, you are actually wrong, sort_keys has been defaulting to False ever since Python 2.7: https://docs.python.org/2.7/library/json.html#json.dump
Now try to bury this fact with "unlikes" ;-)

@sjktje
Copy link

sjktje commented Jan 14, 2019

So, I have to use yet another package (oyaml), or is this going to be fixed anytime soon? :-)

@perlpunk
Copy link
Member

Fixed by #254

@orodbhen
Copy link

The referenced fix is for the dumper. Has the loader also been fixed?

@shoogle
Copy link

shoogle commented Jun 11, 2019

@orodbhen, I believe it was only the dumper that was broken. As long as you are using a version of Python >= 3.6, if you print a YAML dictionary (rather than dump it) then it prints in insersion order regardless of PyYAML version.

@orodbhen
Copy link

I just install 5.1.1 using pip, and it does look like this has all been fixed for both the loader and the dumper. The loader preserves the order now by default, whereas the dumper requires setting sort_keys=False. Thanks!

@brezniczky
Copy link

I am not sure if this would best be also featured and perhaps exemplified in the documentation, the length of the above discussion may seem overly daunting for the shallow user (aka myself).
If you agree, would you suggest to open an issue (although it's such a minor thing), or is it e.g. anything that would get done soon anyway...?

@chiboreache
Copy link

btw: pprint is sorting too! -___-

pprint.pprint({}, sort_dicts=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests