Skip to content

Commit

Permalink
Rebalance CircuitInstruction and PackedInstruction (Qiskit#12730)
Browse files Browse the repository at this point in the history
* Rebalance `CircuitInstruction` and `PackedInstruction`

This is a large overhaul of how circuit instructions are both stored in
Rust (`PackedInstruction`) and how they are presented to Python
(`CircuitInstruction`).  In summary:

* The old `OperationType` enum is now collapsed into a manually managed
  `PackedOperation`.  This is logically equivalent, but stores a
  `PyGate`/`PyInstruction`/`PyOperation` indirectly through a boxed
  pointer, and stores a `StandardGate` inline.  As we expect the vast
  majority of gates to be standard, this hugely reduces the memory
  usage.  The enumeration is manually compressed to a single pointer,
  hiding the discriminant in the low, alignment-required bytes of the
  pointer.

* `PackedOperation::view()` unpacks the operation into a proper
  reference-like enumeration `OperationRef<'a>`, which implements
  `Operation` (though there is also a `try_standard_gate` method to get
  the gate without unpacking the whole enumeration).

* Both `PackedInstruction` and `CircuitInstruction` use this
  `PackedOperation` as the operation storage.

* `PackedInstruction` is now completely the Rust-space format for data,
  and `CircuitInstruction` is purely for communication with Python.

On my machine, this commit brings the utility-scale benchmarks to within
10% of the runtime of 1.1.0 (and some to parity), despite all the
additional overhead.

Changes to accepting and building Python objects
------------------------------------------------

* A `PackedInstruction` is created by copy constructor from a
  `CircuitInstruction` by `CircuitData::pack`.  There is no `pack_owned`
  (really, there never was - the previous method didn't take ownership)
  because there's never owned `CircuitInstruction`s coming in; they're
  Python-space interop, so we never own them (unless we clone them)
  other than when we're unpacking them.

* `PackedInstruction` is currently just created manually when not coming
  from a `CircuitInstruction`.  It's not hard, and makes it easier to
  re-use known intern indices than to waste time re-interning them.
  There is no need to go via `CircuitInstruction`.

* `CircuitInstruction` now has two separated Python-space constructors:
  the old one, which is the default and takes `(operation, qubits,
  clbits)` (and extracts the information), and a new fast-path
  `from_standard` which asks only for the standard gate, qubits and
  params, avoiding operator construction.

* To accept a Python-space operation, extract a Python object to
  `OperationFromPython`.  This extracts the components that are separate
  in Rust space, but joined in Python space (the operation, params and
  extra attributes).  This replaces `OperationInput` and
  `OperationTypeConstruct`, being more efficient at the extraction,
  including providing the data in the formats needed for
  `PackedInstruction` or `CircuitInstruction`.

* To retrieve the Python-space operation, use
  `CircuitInstruction::get_operation` or
  `PackedInstruction::unpack_py_op` as appropriate.  Both will
  cache and reuse the op, if `cache_pygates` is active.  (Though note
  that if the op is created by `CircuitInstruction`, it will not
  propagate back to a `PackedInstruction`.)

Avoiding operation creation
---------------------------

The `_raw_op` field of `CircuitInstruction` is gone, because `PyGate`,
`PyInstruction` and `PyOperation` are no longer pyclasses and no longer
exposed to Python.  Instead, we avoid operation creation by:

* having an internal `DAGNode::_to_circuit_instruction`, which returns a
  copy of the internal `CircuitInstruction`, which can then be used with
  `CircuitInstruction.replace`, etc.

* having `CircuitInstruction::is_standard_gate` to query from Python
  space if we should bother to create the operator.

* changing `CircuitData::map_ops` to `map_nonstandard_ops`, and having
  it only call the Python callback function if the operation is not an
  unconditional standard gate.

Memory usage
------------

Given the very simple example construction script:

```python
from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1_000)
for _ in range(3_000):
    for q in qc.qubits:
        qc.rz(0.0, q)
    for q in qc.qubits:
        qc.rx(0.0, q)
    for q in qc.qubits:
        qc.rz(0.0, q)
    for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
        qc.cx(a, b)
```

This uses 1.5GB in max resident set size on my Macbook (note that it's
about 12 million gates) on both 1.1.0 and with this commit, so we've
undone our memory losses.  The parent of this commit uses 2GB.

However, we're in a strong position to beat 1.1.0 in the future now;
there are two obvious large remaining costs:

* There are 16 bytes per `PackedInstruction` for the Python-operation
  caching (worth about 180MB in this benchmark, since no Python
  operations are actually created).

* There is also significant memory wastage in the current
  `SmallVec<[Param; 3]>` storage of the parameters; for all standard
  gates, we know statically how many parameters are / should be stored,
  and we never need to increase the capacity.  Further, the `Param` enum
  is 16 bytes wide per parameter, of which nearly 8 bytes is padding,
  but for all our current use cases, we only care if _all_ the
  parameters or floats (for everything else, we're going to have to
  defer to Python).  We could move the discriminant out to the level of
  the parameters structure, and save a large amount of padding.

Further work
------------

There's still performance left on the table here:

* We still copy-in and copy-out of `CircuitInstruction` too much right
  now; we might want to make all the `CircuitInstruction` fields
  nullable and have `CircuitData::append` take them by _move_ rather
  than by copy.

* The qubits/clbits interner requires owned arrays going in, but most
  interning should return an existing entry.  We probably want to switch
  to have the interner take references/iterators by default, and clone
  when necessary.  We could have a small circuit optimisation where the
  intern contexts reserve the first n entries to use for an all-to-all
  connectivity interning for up to (say) 8 qubits, since the transpiler
  will want to create a lot of ephemeral small circuits.

* The `Param` vectors are too heavy at the moment; `SmallVec<[Param;
  3]>` is 56 bytes wide, despite the vast majority of gates we care
  about having at most one single float (8 bytes).  Dead padding is a
  large chunk of the memory use currently.

* Fix clippy in no-gate-cache mode

* Fix pylint unused-import complaints

* Fix broken assumptions around the gate model

The `compose` test had a now-broken assumption, because the Python-space
`is` check is no longer expected to return an identical object when a
standard gate is moved from one circuit to another and has its
components remapped as part of the `compose` operation.  This doesn't
constitute the unpleasant deep-copy that that test is preventing. A
custom gate still satisfies that, however, so we can just change the
test.

`DAGNode::set_name` could cause problems if it was called for the first
time on a `CircuitInstruction` that was for a standard gate; these would
be created as immutable instances.  Given the changes in operator
extraction to Rust space, it can now be the case that a standard gate
that comes in as mutable is unpacked into Rust space, the cache is some
time later invalidated, and then the operation is recreated immutably.

* Fix lint

* Fix minor documentation
  • Loading branch information
jakelishman authored and Procatv committed Aug 1, 2024
1 parent 60f05b4 commit 7adc18c
Show file tree
Hide file tree
Showing 25 changed files with 1,371 additions and 1,473 deletions.
5 changes: 3 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ license = "Apache-2.0"
#
# Each crate can add on specific features freely as it inherits.
[workspace.dependencies]
bytemuck = "1.16"
indexmap.version = "2.2.6"
hashbrown.version = "0.14.0"
num-complex = "0.4"
Expand Down
75 changes: 30 additions & 45 deletions crates/accelerate/src/convert_2q_block_matrix.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,33 +23,32 @@ use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
use smallvec::SmallVec;

use qiskit_circuit::bit_data::BitData;
use qiskit_circuit::circuit_instruction::{operation_type_to_py, CircuitInstruction};
use qiskit_circuit::circuit_instruction::CircuitInstruction;
use qiskit_circuit::dag_node::DAGOpNode;
use qiskit_circuit::gate_matrix::ONE_QUBIT_IDENTITY;
use qiskit_circuit::imports::QI_OPERATOR;
use qiskit_circuit::operations::{Operation, OperationType};
use qiskit_circuit::operations::{Operation, OperationRef};

use crate::QiskitError;

fn get_matrix_from_inst<'py>(
py: Python<'py>,
inst: &'py CircuitInstruction,
) -> PyResult<Array2<Complex64>> {
match inst.operation.matrix(&inst.params) {
Some(mat) => Ok(mat),
None => match inst.operation {
OperationType::Standard(_) => Err(QiskitError::new_err(
"Parameterized gates can't be consolidated",
)),
OperationType::Gate(_) => Ok(QI_OPERATOR
.get_bound(py)
.call1((operation_type_to_py(py, inst)?,))?
.getattr(intern!(py, "data"))?
.extract::<PyReadonlyArray2<Complex64>>()?
.as_array()
.to_owned()),
_ => unreachable!("Only called for unitary ops"),
},
if let Some(mat) = inst.op().matrix(&inst.params) {
Ok(mat)
} else if inst.operation.try_standard_gate().is_some() {
Err(QiskitError::new_err(
"Parameterized gates can't be consolidated",
))
} else {
Ok(QI_OPERATOR
.get_bound(py)
.call1((inst.get_operation(py)?,))?
.getattr(intern!(py, "data"))?
.extract::<PyReadonlyArray2<Complex64>>()?
.as_array()
.to_owned())
}
}

Expand Down Expand Up @@ -127,34 +126,20 @@ pub fn change_basis(matrix: ArrayView2<Complex64>) -> Array2<Complex64> {

#[pyfunction]
pub fn collect_2q_blocks_filter(node: &Bound<PyAny>) -> Option<bool> {
match node.downcast::<DAGOpNode>() {
Ok(bound_node) => {
let node = bound_node.borrow();
match &node.instruction.operation {
OperationType::Standard(gate) => Some(
gate.num_qubits() <= 2
&& node
.instruction
.extra_attrs
.as_ref()
.and_then(|attrs| attrs.condition.as_ref())
.is_none()
&& !node.is_parameterized(),
),
OperationType::Gate(gate) => Some(
gate.num_qubits() <= 2
&& node
.instruction
.extra_attrs
.as_ref()
.and_then(|attrs| attrs.condition.as_ref())
.is_none()
&& !node.is_parameterized(),
),
_ => Some(false),
}
}
Err(_) => None,
let Ok(node) = node.downcast::<DAGOpNode>() else { return None };
let node = node.borrow();
match node.instruction.op() {
gate @ (OperationRef::Standard(_) | OperationRef::Gate(_)) => Some(
gate.num_qubits() <= 2
&& node
.instruction
.extra_attrs
.as_ref()
.and_then(|attrs| attrs.condition.as_ref())
.is_none()
&& !node.is_parameterized(),
),
_ => Some(false),
}
}

Expand Down
33 changes: 12 additions & 21 deletions crates/accelerate/src/euler_one_qubit_decomposer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -743,7 +743,7 @@ pub fn compute_error_list(
.iter()
.map(|node| {
(
node.instruction.operation.name().to_string(),
node.instruction.op().name().to_string(),
smallvec![], // Params not needed in this path
)
})
Expand Down Expand Up @@ -988,11 +988,10 @@ pub fn optimize_1q_gates_decomposition(
.iter()
.map(|node| {
if let Some(err_map) = error_map {
error *=
compute_error_term(node.instruction.operation.name(), err_map, qubit)
error *= compute_error_term(node.instruction.op().name(), err_map, qubit)
}
node.instruction
.operation
.op()
.matrix(&node.instruction.params)
.expect("No matrix defined for operation")
})
Expand Down Expand Up @@ -1046,24 +1045,16 @@ fn matmul_1q(operator: &mut [[Complex64; 2]; 2], other: Array2<Complex64>) {

#[pyfunction]
pub fn collect_1q_runs_filter(node: &Bound<PyAny>) -> bool {
let op_node = node.downcast::<DAGOpNode>();
match op_node {
Ok(bound_node) => {
let node = bound_node.borrow();
node.instruction.operation.num_qubits() == 1
&& node.instruction.operation.num_clbits() == 0
&& node
.instruction
.operation
.matrix(&node.instruction.params)
.is_some()
&& match &node.instruction.extra_attrs {
None => true,
Some(attrs) => attrs.condition.is_none(),
}
let Ok(node) = node.downcast::<DAGOpNode>() else { return false };
let node = node.borrow();
let op = node.instruction.op();
op.num_qubits() == 1
&& op.num_clbits() == 0
&& op.matrix(&node.instruction.params).is_some()
&& match &node.instruction.extra_attrs {
None => true,
Some(attrs) => attrs.condition.is_none(),
}
Err(_) => false,
}
}

#[pymodule]
Expand Down
1 change: 1 addition & 0 deletions crates/circuit/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ name = "qiskit_circuit"
doctest = false

[dependencies]
bytemuck.workspace = true
hashbrown.workspace = true
num-complex.workspace = true
ndarray.workspace = true
Expand Down
Loading

0 comments on commit 7adc18c

Please sign in to comment.