Cirq 1.0 statevector return type requirements for simulators consume more RAM after a qsim run #6107

rht · 2023-05-25T09:11:42Z

Description of the issue
First reported in quantumlib/qsim#612. Cirq ~=1.0 requires qsimcirq to return its simulation output as a cirq.StateVectorTrialResult. In the current implementation, it causes an OOM when running a 32-qubit circuit on an a2-highgpu-1g, with a RAM of 85 GB. But it used to be not the case in qsimcirq 0.13.

In the specific case when the statevector is final (no further operations on the statevector are needed after the simulation), this construction is expensive as it requires, at one point, 3x-4x more RAM than is necessary. The allocations are:

The C++ buffer of the statevector in the qsim layer. Is it scratch?
The Python buffer of the statevector in the Cirq layer
~~The simulation output viewed as an array of np.complex64~~ this view has been removed by @NoureldinYosri (https://github.com/quantumlib/qsim/blob/7b921299e53073e1f4e35c9b349dcf9655d76b63/qsimcirq/qsim_simulator.py#L561 in quantumlib/qsim@0009bc4) this shouldn't cause any extra RAM because it's just a view
The copy of the simulation output

A quick modification on a live Cirq 1.1.0 install, where I removed the state_vector = state_vector.copy(), resulted in the OOM error gone. But it seems that the extra RAM consumption could be further reduced.

How to reproduce the issue
Steps to reproduce and the output can be found in quantumlib/qsim#612 (comment).

Cirq version
~= 1.0

cc: @daxfohl @95-martin-orion @sergeisakov

The text was updated successfully, but these errors were encountered:

rht · 2023-10-15T04:37:01Z

Solving this piecewise:

allocation 4 could be removed if cirq.StateVectorSimulationState has an extra argument inplace=True for the simulation output, which prevents the copy operation if enabled.
allocation 3 could be removed with qsim_state.astype(np.complex64, copy=False)

Ideally, the Python buffer should be gone, and we only have 1 buffer in C++, but this is not blocking the quick solution for allocation 3 and 4. This might be sufficient for our use case.

rht · 2023-10-15T06:07:40Z

On 30 qubits, for this circuit, removing the copy operation reduces the elapsed:

from memory_profiler import memory_usage
import time

import cirq
import qsimcirq

def f():
    num_qubits = 30
    qc_cirq = cirq.Circuit()
    qubits = cirq.LineQubit.range(num_qubits)
    for i in range(num_qubits):
        qc_cirq.append(cirq.H(qubits[i]))
    sim = qsimcirq.QSimSimulator()
    tic = time.time()
    # sim = cirq.Simulator()
    sim.simulate(qc_cirq)
    print("Elapsed", time.time() - tic)
print("Max memory", max(memory_usage(f)))

# Before
Max memory 17241.0859375 MiB, 11.7 s
# After
Max memory 9045.91015625 MiB, elapsed 7.9 s

qsim_state.astype(np.complex64, copy=False) doesn't work, because qsim_state is an ndarray of floats, which are supposed to be reinterpreted as an ndarray of complexes. I suppose view is a memory view, and doesn't take any relevant space.

9 GiB is about 1 GiB more from an array of 2^30 np.complex64's. This likely means that allocation 1 and 2 don't exist at the same time in the steps.

Edit: benchmark was run on cuQuantum Appliance 23.06 (Cirq 1.1.0, qsimcirq 0.15.0)

rht · 2023-10-15T06:34:36Z

The benchmark on cuQuantum Appliance 22.11 (Cirq 0.14.1, qsimcirq 0.12.1), before the large memory usage was introduced:

Max memory 8943.50390625, elapsed 7.8 s

rht · 2024-02-07T05:45:32Z

Point no. 3 in the original post is not an allocation. It's just a view, but already has been removed by @NoureldinYosri in quantumlib/qsim@0009bc4.

NoureldinYosri · 2024-02-07T22:50:20Z

The return type of the qsim simulator is StateVectorTrialResult which indeed creates an extra buffer inorder to speed up operations on the resultant statevector.

if you just want the state vector you can use simulate_into_1d_array which will return the state vector as 1d np.array without any buffers. this should fix your memory issues. however your will lose the operations that StateVectorTrialResult implements or you will need to implement them yourself if they are not supported by the cirq routines.

rht · 2024-02-07T22:54:53Z

Yeah, I'm aware of simulate_into_1d_array for my use case. The separate question is whether it is a long-term general solution, once NumPy removes its 32 dimensions limitation.

NoureldinYosri · 2024-02-07T23:00:50Z

I suppose the question is whether we want to create a version of StateVectorTrialResult that doesn't use buffers. this will reduce its memory footprint but at the cost of perfomance. StateVectorTrialResult was written with perfomance in mind so creating a version of it that uses less memory will hurt perfomance.

feel free to create a feature request for the unbuffered version of StateVectorTrialResult and we can discuss it there.

rht added the kind/bug-report Something doesn't seem to work. label May 25, 2023

rht changed the title ~~Cirq 1.0 statevector return type requirements for simulators consume more RAM in qsim~~ Cirq 1.0 statevector return type requirements for simulators consume more RAM after a qsim run May 25, 2023

tanujkhattar added triage/discuss Needs decision / discussion, bring these up during Cirq Cynque status/needs-agreed-design We want to do this, but it needs an agreed upon design before implementation labels Jun 7, 2023

verult added triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on and removed triage/discuss Needs decision / discussion, bring these up during Cirq Cynque labels Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cirq 1.0 statevector return type requirements for simulators consume more RAM after a qsim run #6107

Cirq 1.0 statevector return type requirements for simulators consume more RAM after a qsim run #6107

rht commented May 25, 2023 •

edited

Loading

rht commented Oct 15, 2023

rht commented Oct 15, 2023 •

edited

Loading

rht commented Oct 15, 2023

rht commented Feb 7, 2024

NoureldinYosri commented Feb 7, 2024

rht commented Feb 7, 2024

NoureldinYosri commented Feb 7, 2024

Cirq 1.0 statevector return type requirements for simulators consume more RAM after a qsim run #6107

Cirq 1.0 statevector return type requirements for simulators consume more RAM after a qsim run #6107

Comments

rht commented May 25, 2023 • edited Loading

rht commented Oct 15, 2023

rht commented Oct 15, 2023 • edited Loading

rht commented Oct 15, 2023

rht commented Feb 7, 2024

NoureldinYosri commented Feb 7, 2024

rht commented Feb 7, 2024

NoureldinYosri commented Feb 7, 2024

rht commented May 25, 2023 •

edited

Loading

rht commented Oct 15, 2023 •

edited

Loading