Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidicts does not preserve order on update #68

Closed
lysyloren opened this issue Apr 29, 2017 · 7 comments
Closed

Multidicts does not preserve order on update #68

lysyloren opened this issue Apr 29, 2017 · 7 comments

Comments

@lysyloren
Copy link

When updating existing key in multidict, it is moved to the end (original order is not preserved):

>>> d = CIMultiDict()
>>> d['1'] = '2'
>>> d['3'] = '4'
>>> d
<CIMultiDict('1': '2', '3': '4')>
>>> d['1'] = '10'
>>> d
<CIMultiDict('3': '4', '1': '10')>

In above example, guarantee of preserving insertion order should leave existing key '1' on its original first position.

@asvetlov
Copy link
Member

I don't think so.
Updating for key '1' means deleting existing key-value pair from multidict and adding a new pair '1': '10'.

Thus the behavior is totally correct.

@lysyloren
Copy link
Author

I disagree. Updating for key '1' means updating existing key-value pair (not deleting it and adding new one). Please note, that this is not implementation detail point of view, but it is how end user see and use it.

Please also compare multidict ordering with: collections.OrderedDict and new Python 3.6 "compact" dict implementation. Both preserve insertion key order on update.

Now multidict behavior is last update preserving order not insertion preserving order and if you decide to leave this behavior, this difference should be clearly stated documentation. Now it is confusing for user familiar with OrderedDict or "compact" dict, because multidict ordering behavior is different from ordering behavior in these objects from standard library.

@asvetlov
Copy link
Member

Th problem is that multidict allows multiple values for the same key.
For d = MultiDict([(1, 'a'), (2, 'b'), (1, 'c')]) the order is (1, 'a'), (2, 'b'), (1, 'c').
Please note that key 1 is appearing twice.

What order should be after d[1] = 'd' operation?

I guess (2, 'b'), (1, 'd). The ordering is clear if we use delete-and-insert operation but is not clean for updating.

Update first occurrence and delete all others? I have no intuitive clean solution.

@asvetlov asvetlov reopened this May 21, 2017
@asvetlov
Copy link
Member

After talking with people at US PyCon I've changed my mind.
Yes, the order should be preserved and behavior must be close to OrderedDict as much as possible.
It means we should replace the last key-value pair and delete all others.
I believe the change will not break any existing code except some unit tests which are explicitly depends on the order in their assertions.

@lysyloren would you make a Pull Request for it?

@lysyloren
Copy link
Author

@lysyloren would you make a Pull Request for it?

Sorry, I have couple of urgent projects and no time for making this Pull Request for now.

Beside of this, the implementation of multidict is now based on list with O(n) complexity. I think, that faster and more reliable implementation may now be based on "compact" dict on Python 3.6 (and OrderedDict on Python < 3.6 for backward compatibility). I realize, that it will require rewrite most of the code, but new code should be simpler, probably Cython version will not be required (because of speed of build in dict), and the most important: complexicity will be O(1).

@asvetlov
Copy link
Member

Hmm, it's complicated question.

At first we should support Python 3.4 and 3.5. We'll drop 3.4 this Autumn or Winter most likely but 3.5 remains the showstopper.
Sure, I can just grab an implementation from 3.6 but you know -- supporting C code is hard and potentially error-prone. For me much easier to keep on Cython level.

At second we use multidicts mostly for relative small dictionaries: HTTP headers and data provided by GET and POST requests.
For this cases O(N) is not worse than O(1) assuming the dict size is less than 30 elements usually.
But I've learned from Raymond's awesome talk (http://pyvideo.org/pycon-us-2017/modern-python-dictionaries-a-confluence-of-a-dozen-great-ideas.html) that keeping hash values in the list explicitly could improve performance very well.
Also we could get rid of using list internally but provide a custom implementation with delaying list shrinking until an amount of deleted items doesn't hit a limit.

All these approaches make sense to be applied but we need to test them and measure performance carefully.

@asvetlov
Copy link
Member

asvetlov commented Jun 8, 2017

Fixed by #87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants