Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make QuantumState.sample_counts faster, O(1) rather than O(N) #8535

Open
jlapeyre opened this issue Aug 14, 2022 · 1 comment
Open

Make QuantumState.sample_counts faster, O(1) rather than O(N) #8535

jlapeyre opened this issue Aug 14, 2022 · 1 comment

Comments

@jlapeyre
Copy link
Contributor

jlapeyre commented Aug 14, 2022

The current implementation of sampling $N$ shot "counts" from QuantumState is $O(N)$. But there is a method that is $O(1)$.

EDIT: See #8618 for a more informed take on this issue.
EDIT: I didn't realize when writing this that it has already been discussed in various places in Qiskit. And even implemented (else where, not in QuantumState) for example #8137

The problem

Specifically, one should sample directly from the distribution of the number of counts for each index rather than sampling from the distribution of indices and then building a count map. The current method is $O(N)$ in both time and space (although the latter is a constraint due to Python, not the algorithm itself.) The method proposed here is $O(1)$ in both time and space.

Currently, to build a count map of N samples of measuring the state vector we:

  1. Compute the probabilities $p(i) = |c_i|^2$ of each computational basis state from the instance of Statevector. (This step varies depending on the subclass of QuantumState.)
  2. Collect N samples from this discrete distribution $p(i)$. This is (in essence) a list of integer indices $i$ each representing a basis state.
  3. Build and return a count map. That is, a dictionary counts where each key is an integer index i, and counts[i] is the number of times i occurred in the list generated in step $2$.

The solution

EDIT: See the next comment for a numpy builtin function that does this solution.

A similar, but simpler problem is sampling the number of heads in $N$ coin tosses. You could simulate $N$ coin tosses and count the number of heads. But it is faster to instead sample once from the binomial distribution.

For our sampling problem:

  1. Obtain a sample $\tilde{k_1}$ of the R.V. $C_1 \sim B(N, p(1))$, the number of samples equal to the first index $1$, where $B(\cdot, \cdot)$ is the binomial distribution.
  2. Use the fact that, in general, to sample from $\Pr(X=x \text{ and } Y=y)$, we can first obtain a sample $\tilde{x}$ from $\Pr(X=x)$ and then sample from $\Pr(Y=y | X = \tilde{x})$. In the case at hand, you sample from $\Pr(C_2 = k_2 | C_1 = \tilde{k_1})$, which is the density of $B(N-\tilde{k_1}, p(2) / (1 - p(1))$.
  3. Repeat step 2 for $C_3$, and so on. More precisely, set $X = (C_1, C_2)$ and $Y=C_3$.

I didn't code this in Python. But the Julia code below can be easily translated.

julia> const pd =  [.1, .2, .4, .3]; # A probability distribution

julia> find_samps(pd, 10^12)  # N = 10^12
4-element Vector{Int64}:
 100000104386
 200000077220
 399999709325
 300000109069

julia> @btime find_samps($pd, 10^12);  #  This is like %timeit
  224.429 ns (2 allocations: 144 bytes)  # Time to run `find_samps` once.

Here is the code:

using Distributions: Binomial

"""
    find_samps(probs, N)

Sample from the distribution of counts obtained by drawing `N` samples from discrete
distribution `probs`.
"""
function find_samps(probs, N)
    sum(probs)  1 || error("probs is not a normalized probability distribution")
    Nrem = N   # N - Nrem is  the number of samples collected so far.
    counts = Int[] # samples of counts.
    q = one(eltype(probs)) # unnormalized probability of remaining i
    for p in @view probs[begin:end-1] # @view just avoids copying
        nsamp = rand(Binomial(Nrem, p / q))
        push!(counts, nsamp)
        Nrem -= nsamp
        q -= p
    end
    push!(counts, Nrem) # All the remaining counts go with last i with prob 1.
    return counts
end
@jlapeyre jlapeyre changed the title Use a more performant algorithm for StateVector.sample_counts Make StateVector.sample_counts faster, O(N) rather than O(1) Aug 14, 2022
@jlapeyre jlapeyre changed the title Make StateVector.sample_counts faster, O(N) rather than O(1) Make StateVector.sample_counts faster, O(1) rather than O(N) Aug 14, 2022
@jlapeyre jlapeyre changed the title Make StateVector.sample_counts faster, O(1) rather than O(N) Make QuantumState.sample_counts faster, O(1) rather than O(N) Aug 14, 2022
@jlapeyre
Copy link
Contributor Author

jlapeyre commented Aug 15, 2022

TL;DR There is a single numpy function that does what we want. We should use it.

A discrete distribution defined by density $(p_1, \ldots, p_n)$ is sometimes called a categorical distribution. According to wikipedia it is also called a generalized Bernouli distribution or multinoulli distribution, which I like because they are descriptive.
Poking around online, including wikipedia and stackexchange, I don't see a method as efficient as what I have above for drawing many samples from this distribution. This is odd because it's not exactly rocket science.

However, just as drawing many sample from a Bernoulli distribution is equivalent to drawing one from the associated binomial distribution (i.e. the coin flipping example mentioned above.), drawing many samples from a generalized Bernoulli distribution is equivalent to a single sample from a multinomial distribution. numpy has a function for this. I didn't look at the code, but it is easy to see that it is using an $O(1)$ algorithm, probably equivalent to the one presented above (and the Julia code). (It is about 4 times slower than the Julia code, which makes me think it might not be implemented purely in C.)

Here is the same example I gave above:

In [1]: from numpy.random import default_rng
In [2]: import numpy as np
In [3]: probs = np.array([.1, .2, .4, .3])
In [4]: rng = default_rng()
In [5]: rng.multinomial(100_000_000_000, probs)
Out[5]: array([10000046586, 20000067875, 40000025644, 29999859895])
In [6]: %timeit rng.multinomial(100_000_000_000, probs)
992 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

EDIT: It is exactly the same algorithm that I have above. It is pure C, but there is a bit of overhead somewhere (or maybe the rng implementation is limiting.)

void random_multinomial(bitgen_t *bitgen_state, RAND_INT_TYPE n,
                        RAND_INT_TYPE *mnix, double *pix, npy_intp d,
                        binomial_t *binomial) {
  double remaining_p = 1.0;
  npy_intp j;
  RAND_INT_TYPE dn = n;
  for (j = 0; j < (d - 1); j++) {
    mnix[j] = random_binomial(bitgen_state, pix[j] / remaining_p, dn, binomial);
    dn = dn - mnix[j];
    if (dn <= 0) {
      break;
    }
    remaining_p -= pix[j];
  }
  if (dn > 0) {
      mnix[d - 1] = dn;
  }
}

Note the check to see if any more samples are remaining to draw; I neglected to do that in the Julia version.
For completeness: Julia of course has an equivalent function

julia> const multd = Multinomial(10^12, [0.1, 0.2, 0.4, 0.3]);
julia> print(rand(multd))
[99999679158, 200000111995, 400000751567, 299999457280]
julia> @btime rand(multd)
  186.036 ns (1 allocation: 96 bytes)

numpy uses Pcg64 by default and Julia uses Xoshiro256++. Casual comments online indicate that the latter is faster. It's not yet in numpy.

A test with a probability vector of length 100 rather than 4, shows numpy only 20% slower than Julia. This scaling in numpy performance is very common. It usually indicates that the worse performance with smaller arrays is due to overhead in the python/numpy interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant