Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore possible regression in local simulation performance #1631

Closed
ihincks opened this issue Apr 19, 2024 · 4 comments · Fixed by #1700
Closed

Explore possible regression in local simulation performance #1631

ihincks opened this issue Apr 19, 2024 · 4 comments · Fixed by #1700
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ihincks
Copy link
Collaborator

ihincks commented Apr 19, 2024

Describe the bug

A user has noticed a dramatic slowdown in SamplerV2 with FakeSherbrooke:

A demo of ~150 4-qubit circuits has gone from taking a few seconds to over 3 minutes on their machine.
Switching to 5-qubit FakeManila speeds this back up.

Multi-threading things taking a long time in a quick timing profile. Transpiler or aer related?

Steps to reproduce

@caleb-johnson may be able to produce one. Otherwise, try something as above.

Expected behavior

Small circuits on a large device are just as fast as those same circuits on a small device.

@ihincks ihincks added the bug Something isn't working label Apr 19, 2024
@garrison
Copy link
Member

garrison commented Apr 19, 2024

Steps to reproduce

This issue is related to my comment at Qiskit/qiskit-addon-cutting#552 (comment) and the following discussion.

To reproduce, just run any of our CKT tutorials with a fake backend that has >100 qubits.

@t-imamichi
Copy link
Member

I identified the root cause of the performance regression. We updated Primitives V2 to enable SamplerV2 and EstimatorV2 to handle different numbers of shots or precision for each pub.
To address this, I implemented BackendSamplerV2 and BackendEstimatorV2 to call backend.run for each pub.
On the other hand, BackendEstimatorV1 and BackendSamplerV1 assume the same number of shots per run call and pass all circuits to backend.run only once.

As a potential solution, I am considering revising BackendSamplerV2 and BackendEstimatorV2 to combine pubs with identical shot or precision settings.

Here is a script to show the performance regression

from timeit import timeit

from qiskit_aer import AerSimulator
from qiskit_ibm_runtime.fake_provider import FakeSherbrooke

from qiskit import QuantumCircuit
from qiskit.primitives import BackendSampler, BackendSamplerV2
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager

backend = AerSimulator.from_backend(FakeSherbrooke())
shots = 10000
num_copies = 10


def gen_circuit(num_qubits: int, reps: int):
    qc = QuantumCircuit(num_qubits)
    for _ in range(reps):
        qc.h(range(num_qubits))
        for i in range(0, num_qubits - 1, 2):
            qc.cx(i, i + 1)
    qc.measure_all()
    return qc


def bench_sampler_v1(qc: QuantumCircuit):
    print("\nBackendSamplerV1")
    sampler = BackendSampler(backend)
    print(f"{timeit(lambda: sampler.run([qc] * num_copies, shots=shots).result(), number=1)} sec")


def bench_sampler_v2(qc: QuantumCircuit):
    print("\nBackendSamplerV2")
    sampler = BackendSamplerV2(backend=backend)
    print(f"{timeit(lambda: sampler.run([qc] * num_copies, shots=shots).result(), number=1)} sec")


qc = gen_circuit(5, 5)
pm = generate_preset_pass_manager(optimization_level=2, backend=backend)
qc2 = pm.run(qc)
bench_sampler_v1(qc2)
bench_sampler_v2(qc2)

output (main branch of Qiskit)

BackendSamplerV1
0.7136987919984676 sec

BackendSamplerV2
9.465235792002204 sec

@t-imamichi
Copy link
Member

t-imamichi commented Apr 22, 2024

I'm working on a PR to address this issue. Qiskit/qiskit#12291

@t-imamichi
Copy link
Member

Since Qiskit/qiskit#12291 was merged, we need to port it here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants