Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the keys in result.Counts be ordered ? #4985

Closed
jlapeyre opened this issue Aug 26, 2020 · 10 comments
Closed

Should the keys in result.Counts be ordered ? #4985

jlapeyre opened this issue Aug 26, 2020 · 10 comments
Labels
status: needs information Further information is requested type: enhancement It's working, but needs polishing

Comments

@jlapeyre
Copy link
Contributor

What is the expected enhancement?

result.Counts would be more convenient if it were a subclass of OrderedDict. The keys are bit strings, which have a natural order, so it would be useful to preserve the order. For example, currently, when displaying the counts interactively, the keys (bit strings) are not ordered.
Is there a reason this choice was made ?
If ordering the counts is not desirable (too expensive or not necessary for most uses, etc.) an alternative is a function that returns the an ordered dict.

@jlapeyre jlapeyre added the type: enhancement It's working, but needs polishing label Aug 26, 2020
@mtreinish
Copy link
Member

mtreinish commented Aug 26, 2020

It was mostly done this way for compatibility with the dict that was there before see #4501 I didn't really think about it except to make the new Counts class subclass the same type as what was there before. That being said I think preserving insertion ordering is fine. But, I don't think we'll need to do anything here. After #4926 merges python will be preserving insertion ordering. In python3.6 that was added because of performance optimizations in that release. but it wasn't explicitly guaranteed yet. In 3.7 it was added as a guarantee moving forward, so we really only need to use OrderedDict while we still support Python3.5.

@jlapeyre
Copy link
Contributor Author

But, I don't think we'll need to do anything here

This makes sense since python dicts are now ordered.

But, in applications I am playing with now, under python 3.8, the strings are not ordered. Since insertion order is preserved, this tells me that the strings are not ordered when the source dict is created. They also don't appear to be inserted in the order in which the results are obtained (which should be random) because many strings with very low counts come before strings with very high counts.

@mtreinish
Copy link
Member

Well we can look at sorting them before they're added to the Counts object here:

https://github.com/Qiskit/qiskit-terra/blob/master/qiskit/result/result.py#L250

change that to be something like

dict_list.append(Counts(sorted(self.data(key)['counts']), **counts_header))

but before we open a PR doing this we should benchmark it to see what the practical overhead of sorting is. If it's minimal lets go for it, but if it adds a noticeable amount of overhead I'd say lets just add a helper method (or kwargs to existing ones) to counts() to get a sorted iter of the counts dict and leave it up to the user.

@jlapeyre
Copy link
Contributor Author

result = ...
raw_dict = result.data(0)['counts']
counts = result.get_counts()
ms = counts.memory_slots

def makeCounts(d, memory_slots):
    sk = sorted(d)  # returns sorted keys
    sd = {k: d[k] for k in sk}
    return Counts(sd, memory_slots=memory_slots)

Then

In [1]: len(raw_dict)                                                                                                                              
Out[1]: 25

In [2]: %timeit Counts(raw_dict, memory_slots=ms)                                                                                                  
38.1 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit makeCounts(raw_dict, memory_slots=ms)                                                                                              
40.4 µs ± 94.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [2]: len(raw_dict)                                                                                                                              
Out[2]: 100

In [3]: %timeit Counts(raw_dict, memory_slots=ms)                                                                                                  
118 µs ± 408 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: %timeit makeCounts(raw_dict, memory_slots=ms)                                                                                              
123 µs ± 422 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  • almost all the extra time is due not to sorting, but to building the intermediate dict.
  • adding a helper method would be the least disruptive in the sense that we are sure it would not affect existing code. But, it adds to the users' cognitive/documentation-digging burden. I can't imagine a situation in which a user would rather have the bitstrings in a random order than in numerical order.

@jlapeyre jlapeyre changed the title Should result.Counts be an OrderedDict ? Should the keys in result.Counts be ordered ? Aug 26, 2020
@1ucian0
Copy link
Member

1ucian0 commented Oct 24, 2020

I can imagine situation where a user wants the counts ordered by value instead, as she might be interested in the most statistically significant results only...

@Avhijit-Nair
Copy link

This issue was recently added in the "To Do" list of Backends, Qobj, and Result . Requires more information on the status.

@1ucian0
Copy link
Member

1ucian0 commented Dec 17, 2021

I can imagine situation where a user wants the counts ordered by value instead, as she might be interested in the most statistically significant results only...

this use cases is covered with the Count.most_frequent() method.

@1ucian0
Copy link
Member

1ucian0 commented Dec 17, 2021

@mtreinish, is the performance analysis done by @jlapeyre enough to make a decision about this issue?

@1ucian0 1ucian0 added the status: needs information Further information is requested label Dec 17, 2021
@1ucian0
Copy link
Member

1ucian0 commented Dec 17, 2021

To reproduce the issue, notice that the output of the following code result in an unsorted key:

from qiskit import Aer, QuantumCircuit

circuit = QuantumCircuit(2)
circuit.h(0)
circuit.cx(0, 1)
circuit.measure_all()

sim = Aer.get_backend('aer_simulator')

result = sim.run(circuit, seed_simulator=10).result()
counts = result.get_counts()
print(counts)
{'11': 505, '00': 519}

@jakelishman
Copy link
Member

In the interests of trimming old issues, I'll close this as stale now. There is more than one ordering that might be useful to users, and "arbitrary" is the cheapest to construct, which is convenient for performance-focussed users who want to do their own post-processing. If there's more to discuss, feel free to re-open.

fwiw, sorting this particular object in Python in the order requested is as simple as Counts(sorted(counts.items())), so imo it's something we should reasonably expect a user to do themselves if they want to choose the performance hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs information Further information is requested type: enhancement It's working, but needs polishing
Projects
None yet
Development

No branches or pull requests

5 participants