Redundant Statevector Conversions During Validation #3601

brandhsn · 2019-12-11T15:45:01Z

Information

Qiskit Terra version: 0.10.0
Python version: 3.6
Operating system: All

What is the current behavior?

When the C++ state vector simulation is run, the state vector is returned as a JSON string to qiskit. There, the string is parsed using a JSONEncoder and validated against a schema using marshmallow (_deserialize in https://github.com/Qiskit/qiskit-terra/blob/f17f8f073e1d76d4b3573878c58415e6f9e46542/qiskit/validation/fields/custom.py#L49 ). When it is returned using Result.get_statevector(), the state vector is serialized again in ( https://github.com/Qiskit/qiskit-terra/blob/f17f8f073e1d76d4b3573878c58415e6f9e46542/qiskit/validation/fields/custom.py#L43 ) and formatted in https://github.com/Qiskit/qiskit-terra/blob/c3cdd355d3314889a1cff6597f0b5509a537d309/qiskit/result/postprocess.py#L176 .
In summary, the state vector goes through following transformations: JSON string -> JSON -> python list of complex numbers -> python list of 2-valued lists -> numpy array of complex numbers.

Steps to reproduce the problem

What is the expected behavior?

These transformations have a significant impact on the runtime when many qubits are simulated (23+). Result.get_statevector() should return a numpy array directly instead of going through _serialize.

Suggested solutions

It seems that some of the transformations are not necessary. The deserialization in custom.py should return a numpy array and the serialization should be skipped when calling Result.get_statevector(). In addition, the deserialization is slow for large state vectors as marshmallow does not reserve memory for the list containing the state vector prior to appending the variables returned by _deserialize. This can be fixed by loading the JSON string using an object_hook ( https://docs.python.org/3/library/json.html ), i.e. directly creating a numpy array while parsing the JSON string instead of relying on marshmallow.

mtreinish · 2019-12-11T15:56:21Z

There are several efforts in progress on this front. The first is just a band-aid to make this use case work: #3181 the second is: #3383 which is the start of reworking the current provider interface to decouple json/ibmq api from the providers interface. The third effort I don't have a link for because I'm still drafting an rfc for the effort, but is a v2 providers interface that doesn't use qobj and the ibmq api semantics at all as the interface to providers, but instead is just python objects.

ghost · 2019-12-11T17:10:14Z

Coming over from the aer issue - here's a quick test to demonstrate the problem (and its severity). TL;DR is that a temporary workaround could be replacing raw_result.get_statevector(i) with direct access np.asarray(raw_result.results[i].data.statevector). It appears that Result.get_statevector is essentially equivalent to running Result.to_dict and then accessing that dict manually (that is to get one statevector result - so if there were a loop over circuits and get_statevector is used within the loop it is REALLY bad).

I was using master branch for these tests.

Results on my machine:

Num Qubits           Sim                  naive to_dict        direct access        get_statevector
5                    0.01202155           0.00018297           0.00000976           0.00006117
7                    0.01114277           0.00023326           0.00001117           0.00011174
9                    0.01502806           0.00045379           0.00002248           0.00031649
11                   0.02526170           0.00163007           0.00011159           0.00149228
13                   0.05523828           0.00749241           0.00030521           0.00974752
15                   0.15380105           0.03816904           0.00121548           0.01784142
17                   0.59182173           0.13023160           0.00485045           0.13570576
19                   2.37321197           0.56625554           0.02343114           0.49493971
21                   9.65335323           2.23207012           0.12038622           2.08556945

Test code:

import time
import random
from pprint import pprint
import numpy as np

from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
from qiskit import Aer
from qiskit.compiler import transpile, assemble

def random_circuit(num_qubits, num_ops=10):
    circ = QuantumCircuit(num_qubits, num_qubits)
    circ.h(range(num_qubits))
    qubit_indices = [i for i in range(num_qubits)]
    for i in range(num_ops):
        control, target, t = random.sample(qubit_indices, 3)
        circ.cx(control, target)
        circ.t(t)
    return circ

if __name__ == '__main__':
    max_qs = 22
    num_samples = 10

    BACKEND_OPTS = {"method":"statevector"}
    sv_sim = Aer.get_backend('statevector_simulator')

    print("{: <20} {: <20} {: <20} {: <20} {: <20}".format("Num Qubits", "Sim", "naive to_dict", "direct access", "get_statevector"))

    for num_qs in range(5, max_qs, 2):
        vals = [num_qs, 0, 0, 0, 0]

        for _ in range(num_samples):
            strt = time.perf_counter()
            raw_result = sv_sim.run(assemble(random_circuit(num_qs, 10), sv_sim), backend_options=BACKEND_OPTS).result()
            vals[1]+=time.perf_counter()-strt

            strt = time.perf_counter()
            dict_result = raw_result.to_dict()
            sv_naive = np.asarray(dict_result['results'][0]['data']['statevector'])
            vals[2]+=time.perf_counter()-strt

            strt = time.perf_counter()
            sv_direct = np.asarray(raw_result.results[0].data.statevector)
            vals[3]+=time.perf_counter()-strt

            strt = time.perf_counter()
            sv_get = raw_result.get_statevector()
            vals[4]+=time.perf_counter()-strt

        for i in range(1, len(vals)):
            vals[i] = vals[i] / num_samples
        print("{: <20} {:<20.8f} {: <20.8f} {: <20.8f} {: <20.8f}".format(*vals))

    print(type(sv_naive), type(sv_direct), type(sv_get))
    print("Are we sure? ", np.array_equal(sv_naive, sv_get), np.array_equal(sv_direct, sv_get))

ghost · 2019-12-24T06:34:10Z

Confirmed the problem is in Result.data() - removing the to_dict() call improves the performance of get_statevector immensely (will likely improve get_unitary(...) even more, and get_counts/memory(...) equivalently but I haven't tested).

Removing the to_dict() from data(...) and updating the rest of the Result module to account for the change should fix the performance issues of the get_statevector/unitary(...) methods. So just change self.data(...)['counts'] to self.data(...).counts and self.data(...)['statevector'] to self.data(...).statevector (and so on...).

Problems will arise though if client-side code is relying on the Result.data(...) method to return a dict (even though the client could just call to_dict() on the ExperimentResult(Data) themselves) so there could be issues there. Still definitely solutions but they may be slightly more complicated.

mtreinish · 2021-04-23T21:06:53Z

This has been fixed now. Since we moved to avoiding marshmallow/forcing serialization and using python objects for the back and forth between terra and backends, aer has been updated to just move the output into a numpy array and return that object in the results.

brandhsn mentioned this issue Dec 11, 2019

Excessive Runtime with Statevector Simulator Due to Aer-Terra Interface Qiskit/qiskit-aer#482

Closed

1ucian0 added the performance label Dec 21, 2019

mtreinish closed this as completed Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant Statevector Conversions During Validation #3601

Redundant Statevector Conversions During Validation #3601

brandhsn commented Dec 11, 2019

mtreinish commented Dec 11, 2019

ghost commented Dec 11, 2019 •

edited by ghost

Loading

ghost commented Dec 24, 2019

mtreinish commented Apr 23, 2021

Redundant Statevector Conversions During Validation #3601

Redundant Statevector Conversions During Validation #3601

Comments

brandhsn commented Dec 11, 2019

Information

What is the current behavior?

Steps to reproduce the problem

What is the expected behavior?

Suggested solutions

mtreinish commented Dec 11, 2019

ghost commented Dec 11, 2019 • edited by ghost Loading

ghost commented Dec 24, 2019

mtreinish commented Apr 23, 2021

ghost commented Dec 11, 2019 •

edited by ghost

Loading