[WIP] Improve speed of `sample_counts` from O(N) to O(1) #8547

jlapeyre · 2022-08-15T20:12:22Z

Use numpy.random.multinomial with parameter N rather than actually generating N counts in QuantumState.sample_counts. This is an $O(1)$ method, whereas the current one is $O(N)$.

I have added the tests to cover my changes.
(NA, no API change) I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
reno release note

Summary

Use a more efficient algorithm for QuantumState.sample_counts.
Fixes #8535.

Details and comments

In #8535 a solution using numpy.random.multinomial was proposed. This PR implements it.

The test has been updated. It now uses $10^7$ shots rather than $2000$, and a tolerance of $0.001$.
Furthermore, we run the tests for 7 different seeds, rather than just one. The time to run all tests has
decreased from $>6$ ms to $<6$ ms (one one machine.)
In the following, I use a plain-python generator
```
    return Counts(
        (labels[i], counts_array[i]) for i in range(len(counts_array)) if counts_array[i] > 0
    )
```
Both labels and counts_array are numpy arrays. So it make sense to instead filter the indices and then build new arrays
indexing into the two arrays. I latter is slower if the vector of probabilities is not too long. But, it would be probably faster for
large arrays. I suppose optimizing for the larger arrays is best. We could do both with a length cutoff, but that addes
complexity, and I think it's premature optimization at this point.

Use multinomial with parameter N rather than acutally generating N counts.

Sampling is now faster. We run the test with more seeds. We still save about 1ms in test time. Furthermore, we increase the number of samples greatly.

coveralls · 2022-08-15T23:27:05Z

Pull Request Test Coverage Report for Build 2864795941

4 of 4 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.003%) to 84.053%

Totals
Change from base Build 2862656180:	0.003%
Covered Lines:	56318
Relevant Lines:	67003

💛 - Coveralls

jlapeyre · 2022-08-16T22:18:52Z

For more guidance on how to evaluate this, the introduction of the wikipedia page for multinomial distribution says

The Bernoulli distribution models the outcome of a single Bernoulli trial. In other words, it models whether flipping a (possibly biased) coin one time will result in either a success (obtaining a head) or failure (obtaining a tail). The binomial distribution generalizes this to the number of heads from performing n independent flips (Bernoulli trials) of the same coin. The multinomial distribution models the outcome of n experiments, where the outcome of each trial has a categorical distribution, such as rolling a k-sided die n times.

Note that distributions over basis states are "categorical distribution"s.

This is the same thing I said in #8535. But, here it is stated succinctly with the full authority of Wikipedia(!)

jlapeyre · 2022-08-17T16:00:47Z

Another thing: We should check other places in Qiskit (maybe Aer?) where we are effectively sampling from a multinomial distribution. For example, noise modeled by perturbing the circuit randomly with each shot could not be handled as in this PR.

yaelbh · 2022-08-18T05:37:44Z

We've had extensive discussions in the past about it in Aer, in both state vector and MPS simulators. Maybe @chriseclectic or Merav remember the conclusions. This came up also recently with the mock backends of qiskit-experiments (Itamar is the contact point).

I'll try to recall myself the discussion and find relevant issues and pull requests. For now I only remember that multinomial was not so magical, but I'll have to recall why. In the case of Aer, part of the story was that Aer is already written in C++, so the question became comparison between different algorithms, without the aspect of using numpy for speeding-up a program.

yaelbh · 2022-08-18T09:09:28Z

I'm sorry that I can't help much more than referring to a search of the word "multinomial" in the Aer repository... https://github.com/Qiskit/qiskit-aer/issues?q=multinomial
I doubt if it's of much help, but check out at least Qiskit/qiskit-aer#831, which reminds me that @hhorii was very much involved in this, and is probably the first person to talk to.

Similarly, in qiskit-experiments: https://github.com/Qiskit/qiskit-experiments/issues?q=multinomial

merav-aharoni · 2022-08-18T13:56:00Z

Referring to @yaelbh 's comment above, there was some work in MPS to determine the fastest algorithm for sample_measure. We discussed three algorithms, two of which are specific to MPS, so are not relevant here. The only one relevant is Algorithm 1 in Qiskit/qiskit-aer#1377 (comment), where we create the accumulated probability vector, generate all the random numbers once, sort the random numbers, and then move up in the probability vector generating a count for every probability hit.
Since Aer is in C++, it might be worth understanding what is done in the numpy.random.multinomial package, and implementing it in Aer.

jlapeyre · 2022-08-19T03:04:07Z

Thanks @yaelbh and @merav-aharoni for pointing me to those issues.

After reading these, it's clear that this PR needs a bit more work.

To be clear, there are two related tasks.

The sample task. Draw and return a list of $N$ samples from the categorical distribution with $n$ probabilities $\mathbf{p}=(p_1,\ldots,p_n)$.
The counts task. Draw $N$ samples from $\mathbf{p}$. Then make a count map. That is, return a list $(c_1,\ldots, c_n)$ where $c_i$ is the number of samples equal to category $i$.

You can perform the counts task by actually generating $N$ samples and binning them. How best to sample depends on the problem parameters, that is $N$, $n$, etc. Also on the computer language.

You can also perform the counts task by generating the counts directly from multinomial distribution, without drawing $N$ samples. This is what is done in numpy.random.multinomial. But, again, whether this is best depends on details of the task.

You can also perform the samples task by first doing the counts task using numpy.random.multinomial and then generating the counts in accordance with the results. That is, after generating the counts, make a list of length $N$ whose first $c_1$ elements are $1$, the next $c_2$ elements are $2$, etc. You could also randomly permute the results. I had not thought of this before reading the the investigation that @lbishop related. It seems that for some parameter regimes, it outperforms naive sampling.

ikkoham · 2022-08-24T04:57:40Z

LGTM, but my only concern is it breaks the API, that is it returns different counts even if the seed is the same.

jlapeyre · 2022-08-24T18:25:17Z

Thanks for looking at this @ikkoham !

LGTM, but my only concern is it breaks the API, that is it returns different counts even if the seed is the same.

Oh, yes, this needs to be fixed.

However, I realized after reading comments from @yaelbh and @merav-aharoni that the method in this PR is in practice not always better. For example, if I have a vector of $10^6$ probabilities and I ask for one sample, using the previous implementation may be faster.

EDIT: A plot of these experiments is given in #8618
I am running experiments now to present probably as a issue rather than a PR. I will close this PR in favor of the other issue when it is ready.

EDIT: I noticed that sampling from multinomial is done in #8137. That could possibly also be done conditionally. Although, I suppose the safest thing is to use multinomial.

jlapeyre · 2024-09-13T21:20:05Z

With the move to Rust, this is obsolete.

Use more efficient method for sample_counts

96a44df

Use multinomial with parameter N rather than acutally generating N counts.

jlapeyre requested review from a team and ikkoham as code owners August 15, 2022 20:12

This comment was marked as duplicate.

Sign in to view

Add reno note for multinomial-in-sample-counts

bfe3d0a

jlapeyre changed the title ~~[WIP] Use more efficient method for sample_counts~~ Use more efficient method for sample_counts Aug 15, 2022

jlapeyre added 3 commits August 15, 2022 17:58

Replace list alloc with generator

639dfae

Update test for sample_counts

6ac8653

Sampling is now faster. We run the test with more seeds. We still save about 1ms in test time. Furthermore, we increase the number of samples greatly.

More updates to tests for sample_counts

6e6df9b

Merge branch 'main' into efficient-prob-sample

e02c5d8

jlapeyre changed the title ~~Use more efficient method for sample_counts~~ Improve speed of sample_counts from O(N) to O(1) Use more efficient method for sample_counts Aug 18, 2022

jlapeyre changed the title ~~Improve speed of sample_counts from O(N) to O(1) Use more efficient method for sample_counts~~ Improve speed of sample_counts from $O(N)$ to $O(1)$ Aug 18, 2022

jlapeyre changed the title ~~Improve speed of sample_counts from $O(N)$ to $O(1)$~~ Improve speed of sample_counts from O(N) to O(1) Aug 18, 2022

jlapeyre changed the title ~~Improve speed of sample_counts from O(N) to O(1)~~ [WIP] Improve speed of sample_counts from O(N) to O(1) Aug 19, 2022

jlapeyre marked this pull request as draft August 24, 2022 18:25

jlapeyre mentioned this pull request Aug 26, 2022

What is the best method for sampling shots? #8618

Open

jlapeyre closed this Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Improve speed of `sample_counts` from O(N) to O(1) #8547

[WIP] Improve speed of `sample_counts` from O(N) to O(1) #8547

jlapeyre commented Aug 15, 2022 •

edited

Loading

This comment was marked as duplicate.

coveralls commented Aug 15, 2022 •

edited

Loading

jlapeyre commented Aug 16, 2022

jlapeyre commented Aug 17, 2022

yaelbh commented Aug 18, 2022

yaelbh commented Aug 18, 2022

merav-aharoni commented Aug 18, 2022

jlapeyre commented Aug 19, 2022

ikkoham commented Aug 24, 2022

jlapeyre commented Aug 24, 2022 •

edited

Loading

jlapeyre commented Sep 13, 2024

[WIP] Improve speed of sample_counts from O(N) to O(1) #8547

[WIP] Improve speed of sample_counts from O(N) to O(1) #8547

Conversation

jlapeyre commented Aug 15, 2022 • edited Loading

Summary

Details and comments

This comment was marked as duplicate.

coveralls commented Aug 15, 2022 • edited Loading

Pull Request Test Coverage Report for Build 2864795941

💛 - Coveralls

jlapeyre commented Aug 16, 2022

jlapeyre commented Aug 17, 2022

yaelbh commented Aug 18, 2022

yaelbh commented Aug 18, 2022

merav-aharoni commented Aug 18, 2022

jlapeyre commented Aug 19, 2022

ikkoham commented Aug 24, 2022

jlapeyre commented Aug 24, 2022 • edited Loading

jlapeyre commented Sep 13, 2024

[WIP] Improve speed of `sample_counts` from O(N) to O(1) #8547

[WIP] Improve speed of `sample_counts` from O(N) to O(1) #8547

jlapeyre commented Aug 15, 2022 •

edited

Loading

coveralls commented Aug 15, 2022 •

edited

Loading

jlapeyre commented Aug 24, 2022 •

edited

Loading