Hardware inverse transformer faster #6173

AlMrvn · 2023-06-28T22:45:37Z

Summarize the task
Transformers implemented naively can be quite slow. In the snippet I am giving with a inversion of a circuit using the fact that the inverse of the Sycamore gate can be decomposed into single qubits gates + sycamore gate take 100x longer than doing the cirq.inverse function on the circuit.
If there was a way to make this better, in cirq directly, or maybe give good practice on how to make these transfomer fast. it would be nice.

Acceptance criteria - when is the task considered done?
reasonable time for the specific transformer up to ~150ish qubits.

** Python Script **

from typing import Mapping

import cirq
import cirq_google
import numpy as np
import time

from cirq.experiments import random_quantum_circuit_generation as rqcg

TEST_SEED: int = 2023
PRNG = cirq.value.parse_random_state(TEST_SEED)


@cirq.transformer
class InverseSycamoreCircuit:
    """Transformer to inverse a circuit with sycamore/fsim(theta=pi/2, phi) gates.

    This transformer will unroll the circuit.
    This transformer operate in the following steps:
        1) inverse the circuit order
        2) inverse the singles qubits gates
        3) inverse the Sycamore gate using the single qubit gate + sycamore inversion:
                (0, 0): ───Syc─── ^-1      ───X───Syc──Z(phi)──
                            │          =           |
                (0, 1): ───Syc───          ───────Syc────X──────
        4) merge the singles qubits moments that are adjacents.

    Example:
        >>> gate = cirq_google.SYC
        >>> pair = cirq.GridQubit.rect(1, 2)
        >>> a,b = pair
        >>> phi_dict = {pair: np.pi/6}
        >>> circuit = cirq.Circuit.from_moments(cirq.X(a), gate.on(*pair), cirq.Y(b))
        >>> transformer = InverseSycamoreCircuit(gate = gate, phi = phi_dict)
        >>> inverse_circuit = transformer(circuit)
    """

    def __init__(
        self,
        gate: cirq.Gate,
        phi_mapping: Mapping[tuple[cirq.Qid, ...], float],
        merge_sq_moments: bool = True,
    ):
        """

        Args:
            gate: the gate to inverse. Usually a iSWAP-like gate/sycamore gate,
            phi_mapping: dictionnary of the parameter phi per pair,
            merge_sq_moment: the transformed circuit will have merged sq gates.
        """
        self.gate = gate
        self.phi_mapping = phi_mapping
        self.merge_sq_moments = merge_sq_moments

    def single_qubit_map_func(self, op: cirq.Operation, index: int) -> cirq.OP_TREE:
        if len(op.qubits) == 1:
            yield cirq.inverse(op, default=op).with_tags(*op.tags)
        else:
            yield op

    def sycamore_inverse_map_func(self, op: cirq.Operation, index: int) -> cirq.OP_TREE:
        """Constructing the inverse of the sycamore from the sycamore gate."""
        if op.gate == self.gate:
            z = self.phi_mapping[op.qubits] / np.pi
            yield cirq.X(op.qubits[0])
            yield self.gate(*op.qubits)
            yield cirq.X(op.qubits[1])
            yield (cirq.Z**z).on(op.qubits[0])
        else:
            yield op

    def __call__(self, circuit: cirq.AbstractCircuit, *, context=None) -> cirq.AbstractCircuit:
        # reverse the circuit order and inverse the single qubit gates
        circuit_rev = cirq.map_operations_and_unroll(circuit[::-1], self.single_qubit_map_func)

        # invert the 2-qubits gates:
        inverse_circuit = cirq.map_operations_and_unroll(
            circuit_rev, self.sycamore_inverse_map_func
        )

        # merge the singles qubits moments
        if self.merge_sq_moments:
            return cirq.transformers.merge_single_qubit_moments_to_phxz(inverse_circuit)
        else:
            return inverse_circuit


if __name__ == "__main__":
    phi = np.pi / 6
    gate = cirq_google.SYC
    pattern = tuple(
        rqcg.GridInteractionLayer(col_offset=k % 2, vertical=False, stagger=False) for k in range(2)
    )

    qubits = cirq.GridQubit.rect(1, 75)

    circuit = rqcg.random_rotations_between_grid_interaction_layers_circuit(
        qubits,
        depth=30,
        two_qubit_op_factory=lambda a, b, _: gate.on(a, b),
        pattern=pattern,
        seed=PRNG,
    )

    phi_dict: Mapping[tuple[cirq.Qid, ...], float] = {
        (qubits[k], qubits[k + 1]): phi for k in range(len(qubits) - 1)
    }
    inversing_syc = InverseSycamoreCircuit(phi_mapping=phi_dict, gate=gate, merge_sq_moments=True)

    # inversing through the transformer
    start = time.time()
    inversing_syc(circuit)
    end = time.time()
    print(f"transformer for hardware inverse: {end - start}")

    # using cirq inverse for comparison
    start = time.time()
    cirq.inverse(circuit)
    end = time.time()
    print(f"'cirq.inverse': {end - start}")

tanujkhattar · 2023-06-28T23:30:17Z

I have a quick fix - #6174, that reduces the time to half, but I think it's still not good enough.

Timing before my PR:

'circuit_rev': 0.31624889373779297
'inverse_circuit': 0.5580418109893799
'Merged circuit': 2.517353057861328
transformer for hardware inverse: 3.393893241882324
'cirq.inverse': 0.03910088539123535

Timing after my PR:

'circuit_rev': 0.31432485580444336
'inverse_circuit': 0.5529079437255859
'Merged circuit': 0.6783010959625244
transformer for hardware inverse: 1.5484230518341064
'cirq.inverse': 0.036971092224121094

Now, each of the transformer steps takes roughly ~0.5 seconds and speeding this up further would probably involve speeding up the map_operations / map_moments transformer primitives in the first place.

tanujkhattar · 2023-06-28T23:40:36Z

The example code for 150 qubits takes the following time:

On master

'circuit_rev': 0.6249358654022217
'inverse_circuit': 1.1757490634918213
'Merged circuit': 4.6269519329071045
transformer for hardware inverse: 6.431884765625
'cirq.inverse': 0.07868170738220215

On #6174

'circuit_rev': 0.6742651462554932
'inverse_circuit': 1.035278081893921
'Merged circuit': 1.5612859725952148
transformer for hardware inverse: 3.2750561237335205
'cirq.inverse': 0.07976198196411133

@AlMrvn What should be "a reasonable time" for the transformer up to ~150ish qubits? Is <1 second a good target to aim for?

AlMrvn · 2023-06-29T00:08:15Z

@tanujkhattar that a decent increase! factor 2 already! Thanks a lot! I am already happy =D

for a what would be reasonable, I would say that these circuit take few microsecond to be run on the hardware for 1 shot. Let's say 4us per shot with some overhead. it will need a lot of shot, so let's say 1 million shots. this give us a run-time on hardware of ~4s. ideally the circuit creation would be small compared to this. so yeah 1s or 0.5s seems good.

for order of magnitude, I usually precompute ~100 of these circuits

tanujkhattar · 2023-08-16T16:13:37Z

xref #6097

tanujkhattar · 2023-08-17T07:16:26Z

@AlMrvn @dstrain115
Tl;Dr - #6250 significantly speeds up the transformer primitives so that the original transformer written using convenient transformer primitives is now very close to a faster optimal implementation and takes < 1sec for the 75 qubit circuit.

After my last set of improvements as discussed in the comment above, the original transformer written by you using transformer primitives in the original issue was taking ~3.3 seconds for 150 qubits case. A rerun gives numbers as follows (pasted again for convenience)

circuit_rev: 0.691652774810791
inverse_circuit: 1.1098392009735107
merged_circuit: 1.6418278217315674
transformer for hardware inverse: 3.4486749172210693
'cirq.inverse': 0.09459590911865234

We also discussed that we can significantly speed up the transformer by completely rewriting it and not using any of the built-in transformer primitives. The implementation we came up with (offline) was as follows (given in the details block).

def merge_func(m1: 'cirq.Moment', m2: 'cirq.Moment') -> cirq.Moment:
    """Assumes both m1 and m2 are moments with single qubit gates."""
    ret_ops = []
    for q in m1.qubits | m2.qubits:
        op1, op2 = m1.operation_at(q), m2.operation_at(q)
        if op1 and op2:
            mat = cirq.unitary(op2) @ cirq.unitary(op1)
            gate = cirq.single_qubit_matrix_to_phxz(mat, atol=1e-8)
            if gate:
                ret_ops.append(gate(q))
        else:
            op = op1 or op2
            assert op is not None
            if isinstance(op.gate, cirq.PhasedXZGate):
                ret_ops.append(op)
            else:
                gate = cirq.single_qubit_matrix_to_phxz(cirq.unitary(op), atol=1e-8)
                if gate:
                    ret_ops.append(gate(q))
    return cirq.Moment(ret_ops)

def is_single_ops_moment(m: cirq.Moment):
    return len(m) == len(m.qubits)

@cirq.transformer
class InverseSycamoreCircuitFast:
    """Transformer to inverse a circuit with sycamore/fsim(theta=pi/2, phi) gates.

    This transformer will unroll the circuit.
    This transformer operate in the following steps:
        1) inverse the circuit order
        2) inverse the singles qubits gates
        3) inverse the Sycamore gate using the single qubit gate + sycamore inversion:
                (0, 0): ───Syc─── ^-1      ───X───Syc──Z(phi)──
                            │          =           |
                (0, 1): ───Syc───          ───────Syc────X──────
        4) merge the singles qubits moments that are adjacents.

    Example:
        >>> gate = cirq_google.SYC
        >>> pair = cirq.GridQubit.rect(1, 2)
        >>> a,b = pair
        >>> phi_dict = {pair: np.pi/6}
        >>> circuit = cirq.Circuit.from_moments(cirq.X(a), gate.on(*pair), cirq.Y(b))
        >>> transformer = InverseSycamoreCircuit(gate = gate, phi = phi_dict)
        >>> inverse_circuit = transformer(circuit)
    """

    def __init__(
        self,
        gate: cirq.Gate,
        phi_mapping: Mapping[tuple[cirq.Qid, ...], float],
        merge_sq_moments: bool = True,
    ):
        """

        Args:
            gate: the gate to inverse. Usually a iSWAP-like gate/sycamore gate,
            phi_mapping: dictionnary of the parameter phi per pair,
            merge_sq_moment: the transformed circuit will have merged sq gates.
        """
        self.gate = gate
        self.phi_mapping = phi_mapping

    def sycamore_inverse_map_func(self, op: cirq.Operation) -> cirq.OP_TREE:
        """Constructing the inverse of the sycamore from the sycamore gate."""
        assert op.gate == self.gate, f"circuit should follow a brickwall pattern with all 2q gates same as {self.gate}"
        z = self.phi_mapping[op.qubits] / np.pi
        yield cirq.Moment(cirq.X(op.qubits[0]))
        yield cirq.Moment(self.gate(*op.qubits))
        yield cirq.Moment(cirq.X(op.qubits[1]), (cirq.Z**z).on(op.qubits[0]))

    def __call__(self, circuit: cirq.AbstractCircuit, *, context=None) -> cirq.AbstractCircuit:
        new_moments = []
        for m in circuit[::-1]:
            if is_single_ops_moment(m):
                # m acts only on single qubit gates
                if new_moments and is_single_ops_moment(new_moments[-1]):
                    new_moments[-1] = merge_func(new_moments[-1], cirq.inverse(m))
                else:
                    new_moments.append(cirq.inverse(m))
            else:
                inv_circuits = [cirq.Circuit(self.sycamore_inverse_map_func(op)) for op in m]
                inv_moments = cirq.Circuit.zip(*inv_circuits).moments
                new_moments[-1] = merge_func(new_moments[-1], inv_moments[0])
                new_moments.extend(inv_moments[1:])
        return cirq.Circuit(new_moments)

The fast version has a runtime of ~1.7 seconds, i.e. running

start = time.time()
inversing_syc_fast(circuit)
end = time.time()
print(f"transformer for hardware inverse fast: {end - start}")

gives

transformer for hardware inverse fast: 1.765840768814087

The ~1.7 second time gets rid of most of the overhead introduced by the built-in circuit transformer routines, like cirq.map_operations, and mostly comes from repeated computations of the unitary matrices for phased XZ gates that need to be merged. For example, here are the top 10 lines from snakeviz profiling output

Now, as part #6250, I've improved the speed of built-in cirq.map_operations transformer primitive such that now, if you run your original implementation of the transformer, that uses the built-in transformer primitives, the runtime is much closer to the optimal runtime of the fast version. For example, running the original transformer on 150 qubits now gives

circuit_rev: 0.13894176483154297
inverse_circuit: 0.21164369583129883
merged_circuit: 1.618177890777588
transformer for hardware inverse: 1.9741430282592773

You can see that the inverse_circuit and circuit_rev, both the stages use the cirq.map_operations_and_unroll primitive and their runtime reduced ~5x each while the runtime of merged_circuit stage remained roughly the same and is same to the optimal runtime of the fast transformer since this is the minimal cost paid due to computing unitaries and merging them.

Note that all of these numbers are for 150 qubit circuits. For 75 qubits, the total runtime < 1sec.

I think we can safely close this issue now once the linked PR is merged?

AlMrvn added the kind/task A task that's part of a larger effort label Jun 28, 2023

tanujkhattar added triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on area/performance area/transformers labels Jun 28, 2023

tanujkhattar mentioned this issue Jun 28, 2023

Speed up execution time of merge_single_qubit_moments_to_phxz transformer by avoiding redundant calls to unitary protocol #6174

Merged

This was referenced Aug 16, 2023

Cirq Performance Improvement Roadmap Item #6097

Closed

Speed up cirq.map_operations and cirq.map_operations_and_unroll #6250

Merged

tanujkhattar self-assigned this Aug 17, 2023

tanujkhattar closed this as completed in #6250 Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware inverse transformer faster #6173

Hardware inverse transformer faster #6173

AlMrvn commented Jun 28, 2023

tanujkhattar commented Jun 28, 2023

tanujkhattar commented Jun 28, 2023

AlMrvn commented Jun 29, 2023

tanujkhattar commented Aug 16, 2023

tanujkhattar commented Aug 17, 2023 •

edited

Loading

Hardware inverse transformer faster #6173

Hardware inverse transformer faster #6173

Comments

AlMrvn commented Jun 28, 2023

tanujkhattar commented Jun 28, 2023

tanujkhattar commented Jun 28, 2023

AlMrvn commented Jun 29, 2023

tanujkhattar commented Aug 16, 2023

tanujkhattar commented Aug 17, 2023 • edited Loading

tanujkhattar commented Aug 17, 2023 •

edited

Loading