Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware inverse transformer faster #6173

Closed
Tracked by #6097
AlMrvn opened this issue Jun 28, 2023 · 5 comments · Fixed by #6250
Closed
Tracked by #6097

Hardware inverse transformer faster #6173

AlMrvn opened this issue Jun 28, 2023 · 5 comments · Fixed by #6250
Assignees
Labels
area/performance area/transformers kind/task A task that's part of a larger effort triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on

Comments

@AlMrvn
Copy link

AlMrvn commented Jun 28, 2023

Summarize the task
Transformers implemented naively can be quite slow. In the snippet I am giving with a inversion of a circuit using the fact that the inverse of the Sycamore gate can be decomposed into single qubits gates + sycamore gate take 100x longer than doing the cirq.inverse function on the circuit.
If there was a way to make this better, in cirq directly, or maybe give good practice on how to make these transfomer fast. it would be nice.

Acceptance criteria - when is the task considered done?
reasonable time for the specific transformer up to ~150ish qubits.

** Python Script **

from typing import Mapping

import cirq
import cirq_google
import numpy as np
import time

from cirq.experiments import random_quantum_circuit_generation as rqcg

TEST_SEED: int = 2023
PRNG = cirq.value.parse_random_state(TEST_SEED)


@cirq.transformer
class InverseSycamoreCircuit:
    """Transformer to inverse a circuit with sycamore/fsim(theta=pi/2, phi) gates.

    This transformer will unroll the circuit.
    This transformer operate in the following steps:
        1) inverse the circuit order
        2) inverse the singles qubits gates
        3) inverse the Sycamore gate using the single qubit gate + sycamore inversion:
                (0, 0): ───Syc─── ^-1      ───X───Syc──Z(phi)──
                            │          =           |
                (0, 1): ───Syc───          ───────Syc────X──────
        4) merge the singles qubits moments that are adjacents.

    Example:
        >>> gate = cirq_google.SYC
        >>> pair = cirq.GridQubit.rect(1, 2)
        >>> a,b = pair
        >>> phi_dict = {pair: np.pi/6}
        >>> circuit = cirq.Circuit.from_moments(cirq.X(a), gate.on(*pair), cirq.Y(b))
        >>> transformer = InverseSycamoreCircuit(gate = gate, phi = phi_dict)
        >>> inverse_circuit = transformer(circuit)
    """

    def __init__(
        self,
        gate: cirq.Gate,
        phi_mapping: Mapping[tuple[cirq.Qid, ...], float],
        merge_sq_moments: bool = True,
    ):
        """

        Args:
            gate: the gate to inverse. Usually a iSWAP-like gate/sycamore gate,
            phi_mapping: dictionnary of the parameter phi per pair,
            merge_sq_moment: the transformed circuit will have merged sq gates.
        """
        self.gate = gate
        self.phi_mapping = phi_mapping
        self.merge_sq_moments = merge_sq_moments

    def single_qubit_map_func(self, op: cirq.Operation, index: int) -> cirq.OP_TREE:
        if len(op.qubits) == 1:
            yield cirq.inverse(op, default=op).with_tags(*op.tags)
        else:
            yield op

    def sycamore_inverse_map_func(self, op: cirq.Operation, index: int) -> cirq.OP_TREE:
        """Constructing the inverse of the sycamore from the sycamore gate."""
        if op.gate == self.gate:
            z = self.phi_mapping[op.qubits] / np.pi
            yield cirq.X(op.qubits[0])
            yield self.gate(*op.qubits)
            yield cirq.X(op.qubits[1])
            yield (cirq.Z**z).on(op.qubits[0])
        else:
            yield op

    def __call__(self, circuit: cirq.AbstractCircuit, *, context=None) -> cirq.AbstractCircuit:
        # reverse the circuit order and inverse the single qubit gates
        circuit_rev = cirq.map_operations_and_unroll(circuit[::-1], self.single_qubit_map_func)

        # invert the 2-qubits gates:
        inverse_circuit = cirq.map_operations_and_unroll(
            circuit_rev, self.sycamore_inverse_map_func
        )

        # merge the singles qubits moments
        if self.merge_sq_moments:
            return cirq.transformers.merge_single_qubit_moments_to_phxz(inverse_circuit)
        else:
            return inverse_circuit


if __name__ == "__main__":
    phi = np.pi / 6
    gate = cirq_google.SYC
    pattern = tuple(
        rqcg.GridInteractionLayer(col_offset=k % 2, vertical=False, stagger=False) for k in range(2)
    )

    qubits = cirq.GridQubit.rect(1, 75)

    circuit = rqcg.random_rotations_between_grid_interaction_layers_circuit(
        qubits,
        depth=30,
        two_qubit_op_factory=lambda a, b, _: gate.on(a, b),
        pattern=pattern,
        seed=PRNG,
    )

    phi_dict: Mapping[tuple[cirq.Qid, ...], float] = {
        (qubits[k], qubits[k + 1]): phi for k in range(len(qubits) - 1)
    }
    inversing_syc = InverseSycamoreCircuit(phi_mapping=phi_dict, gate=gate, merge_sq_moments=True)

    # inversing through the transformer
    start = time.time()
    inversing_syc(circuit)
    end = time.time()
    print(f"transformer for hardware inverse: {end - start}")

    # using cirq inverse for comparison
    start = time.time()
    cirq.inverse(circuit)
    end = time.time()
    print(f"'cirq.inverse': {end - start}")
@AlMrvn AlMrvn added the kind/task A task that's part of a larger effort label Jun 28, 2023
@tanujkhattar tanujkhattar added triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on area/performance area/transformers labels Jun 28, 2023
@tanujkhattar
Copy link
Collaborator

I have a quick fix - #6174, that reduces the time to half, but I think it's still not good enough.

Timing before my PR:

'circuit_rev': 0.31624889373779297
'inverse_circuit': 0.5580418109893799
'Merged circuit': 2.517353057861328
transformer for hardware inverse: 3.393893241882324
'cirq.inverse': 0.03910088539123535

Timing after my PR:

'circuit_rev': 0.31432485580444336
'inverse_circuit': 0.5529079437255859
'Merged circuit': 0.6783010959625244
transformer for hardware inverse: 1.5484230518341064
'cirq.inverse': 0.036971092224121094

Now, each of the transformer steps takes roughly ~0.5 seconds and speeding this up further would probably involve speeding up the map_operations / map_moments transformer primitives in the first place.

@tanujkhattar
Copy link
Collaborator

The example code for 150 qubits takes the following time:

On master

'circuit_rev': 0.6249358654022217
'inverse_circuit': 1.1757490634918213
'Merged circuit': 4.6269519329071045
transformer for hardware inverse: 6.431884765625
'cirq.inverse': 0.07868170738220215

On #6174

'circuit_rev': 0.6742651462554932
'inverse_circuit': 1.035278081893921
'Merged circuit': 1.5612859725952148
transformer for hardware inverse: 3.2750561237335205
'cirq.inverse': 0.07976198196411133

@AlMrvn What should be "a reasonable time" for the transformer up to ~150ish qubits? Is <1 second a good target to aim for?

@AlMrvn
Copy link
Author

AlMrvn commented Jun 29, 2023

@tanujkhattar that a decent increase! factor 2 already! Thanks a lot! I am already happy =D

for a what would be reasonable, I would say that these circuit take few microsecond to be run on the hardware for 1 shot. Let's say 4us per shot with some overhead. it will need a lot of shot, so let's say 1 million shots. this give us a run-time on hardware of ~4s. ideally the circuit creation would be small compared to this. so yeah 1s or 0.5s seems good.

for order of magnitude, I usually precompute ~100 of these circuits

@tanujkhattar
Copy link
Collaborator

xref #6097

@tanujkhattar
Copy link
Collaborator

tanujkhattar commented Aug 17, 2023

@AlMrvn @dstrain115
Tl;Dr - #6250 significantly speeds up the transformer primitives so that the original transformer written using convenient transformer primitives is now very close to a faster optimal implementation and takes < 1sec for the 75 qubit circuit.

After my last set of improvements as discussed in the comment above, the original transformer written by you using transformer primitives in the original issue was taking ~3.3 seconds for 150 qubits case. A rerun gives numbers as follows (pasted again for convenience)

circuit_rev: 0.691652774810791
inverse_circuit: 1.1098392009735107
merged_circuit: 1.6418278217315674
transformer for hardware inverse: 3.4486749172210693
'cirq.inverse': 0.09459590911865234

We also discussed that we can significantly speed up the transformer by completely rewriting it and not using any of the built-in transformer primitives. The implementation we came up with (offline) was as follows (given in the details block).

def merge_func(m1: 'cirq.Moment', m2: 'cirq.Moment') -> cirq.Moment:
    """Assumes both m1 and m2 are moments with single qubit gates."""
    ret_ops = []
    for q in m1.qubits | m2.qubits:
        op1, op2 = m1.operation_at(q), m2.operation_at(q)
        if op1 and op2:
            mat = cirq.unitary(op2) @ cirq.unitary(op1)
            gate = cirq.single_qubit_matrix_to_phxz(mat, atol=1e-8)
            if gate:
                ret_ops.append(gate(q))
        else:
            op = op1 or op2
            assert op is not None
            if isinstance(op.gate, cirq.PhasedXZGate):
                ret_ops.append(op)
            else:
                gate = cirq.single_qubit_matrix_to_phxz(cirq.unitary(op), atol=1e-8)
                if gate:
                    ret_ops.append(gate(q))
    return cirq.Moment(ret_ops)

def is_single_ops_moment(m: cirq.Moment):
    return len(m) == len(m.qubits)

@cirq.transformer
class InverseSycamoreCircuitFast:
    """Transformer to inverse a circuit with sycamore/fsim(theta=pi/2, phi) gates.

    This transformer will unroll the circuit.
    This transformer operate in the following steps:
        1) inverse the circuit order
        2) inverse the singles qubits gates
        3) inverse the Sycamore gate using the single qubit gate + sycamore inversion:
                (0, 0): ───Syc─── ^-1      ───X───Syc──Z(phi)──
                            │          =           |
                (0, 1): ───Syc───          ───────Syc────X──────
        4) merge the singles qubits moments that are adjacents.

    Example:
        >>> gate = cirq_google.SYC
        >>> pair = cirq.GridQubit.rect(1, 2)
        >>> a,b = pair
        >>> phi_dict = {pair: np.pi/6}
        >>> circuit = cirq.Circuit.from_moments(cirq.X(a), gate.on(*pair), cirq.Y(b))
        >>> transformer = InverseSycamoreCircuit(gate = gate, phi = phi_dict)
        >>> inverse_circuit = transformer(circuit)
    """

    def __init__(
        self,
        gate: cirq.Gate,
        phi_mapping: Mapping[tuple[cirq.Qid, ...], float],
        merge_sq_moments: bool = True,
    ):
        """

        Args:
            gate: the gate to inverse. Usually a iSWAP-like gate/sycamore gate,
            phi_mapping: dictionnary of the parameter phi per pair,
            merge_sq_moment: the transformed circuit will have merged sq gates.
        """
        self.gate = gate
        self.phi_mapping = phi_mapping

    def sycamore_inverse_map_func(self, op: cirq.Operation) -> cirq.OP_TREE:
        """Constructing the inverse of the sycamore from the sycamore gate."""
        assert op.gate == self.gate, f"circuit should follow a brickwall pattern with all 2q gates same as {self.gate}"
        z = self.phi_mapping[op.qubits] / np.pi
        yield cirq.Moment(cirq.X(op.qubits[0]))
        yield cirq.Moment(self.gate(*op.qubits))
        yield cirq.Moment(cirq.X(op.qubits[1]), (cirq.Z**z).on(op.qubits[0]))

    def __call__(self, circuit: cirq.AbstractCircuit, *, context=None) -> cirq.AbstractCircuit:
        new_moments = []
        for m in circuit[::-1]:
            if is_single_ops_moment(m):
                # m acts only on single qubit gates
                if new_moments and is_single_ops_moment(new_moments[-1]):
                    new_moments[-1] = merge_func(new_moments[-1], cirq.inverse(m))
                else:
                    new_moments.append(cirq.inverse(m))
            else:
                inv_circuits = [cirq.Circuit(self.sycamore_inverse_map_func(op)) for op in m]
                inv_moments = cirq.Circuit.zip(*inv_circuits).moments
                new_moments[-1] = merge_func(new_moments[-1], inv_moments[0])
                new_moments.extend(inv_moments[1:])
        return cirq.Circuit(new_moments)

The fast version has a runtime of ~1.7 seconds, i.e. running

start = time.time()
inversing_syc_fast(circuit)
end = time.time()
print(f"transformer for hardware inverse fast: {end - start}")

gives

transformer for hardware inverse fast: 1.765840768814087

The ~1.7 second time gets rid of most of the overhead introduced by the built-in circuit transformer routines, like cirq.map_operations, and mostly comes from repeated computations of the unitary matrices for phased XZ gates that need to be merged. For example, here are the top 10 lines from snakeviz profiling output

image

Now, as part #6250, I've improved the speed of built-in cirq.map_operations transformer primitive such that now, if you run your original implementation of the transformer, that uses the built-in transformer primitives, the runtime is much closer to the optimal runtime of the fast version. For example, running the original transformer on 150 qubits now gives

circuit_rev: 0.13894176483154297
inverse_circuit: 0.21164369583129883
merged_circuit: 1.618177890777588
transformer for hardware inverse: 1.9741430282592773

You can see that the inverse_circuit and circuit_rev, both the stages use the cirq.map_operations_and_unroll primitive and their runtime reduced ~5x each while the runtime of merged_circuit stage remained roughly the same and is same to the optimal runtime of the fast transformer since this is the minimal cost paid due to computing unitaries and merging them.

Note that all of these numbers are for 150 qubit circuits. For 75 qubits, the total runtime < 1sec.

I think we can safely close this issue now once the linked PR is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance area/transformers kind/task A task that's part of a larger effort triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants