-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support control flow in SabreSwap
and SabreLayout
.
#10366
Conversation
Previously, we were using original DAG node IDs to look up nodes in the expanded nested DAGs, but these IDs are of course not the same.
f07ca45
to
dc03e56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you!!
In testing: the current tests have a lot of surface area, but they mostly give the inner blocks the same qubits as the outer blocks. It might be good to include a couple of the unit tests that explicitly attempt to route circuits with control-flow ops that appear in the same layer, and are defined over qubits that aren't in the outer circuit (and maybe some that are in the outer circuit, but bound in different orders). I think all the pieces are in place in your code that it should work out of the box, and I think the random and disjoint tests touch on them a little bit, but it'd be good to have one or two more targetted tests, since this is our premier routing and layout algorithm.
Also, ought we to have some tests of SabreLayout
as well?
fn swap_epilogue(&self, py: Python) -> PyObject { | ||
self.swap_epilogue | ||
.iter() | ||
.map(|x| x.into_py(py)) | ||
.collect::<Vec<_>>() | ||
.into_pyarray(py) | ||
.into() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird that we can't have a direct impl
for IntoPyArray
to ToPyArray
, since Vec<[usize; N]>
should point to the same memory layout as a 2D Numpy array of usize
. I guess that's something we can take up with the numpy
crate, though - we couldn't write the impl
locally because of the coherence rules.
match dag.node_blocks.get(py_node) { | ||
Some(blocks) => { | ||
// Control flow op. Route all blocks for current layout. | ||
let mut block_results: Vec<BlockResult> = Vec::with_capacity(blocks.len()); | ||
for inner_dag in blocks { | ||
let (inner_dag_routed, inner_final_layout) = | ||
route_block_dag(inner_dag, layout.copy()); | ||
|
||
// For now, we always append a swap circuit that gets the inner block | ||
// back to the parent's layout. | ||
let swap_epilogue = | ||
gen_swap_epilogue(coupling, inner_final_layout, layout, seed); | ||
let block_result = BlockResult { | ||
result: inner_dag_routed, | ||
swap_epilogue, | ||
}; | ||
block_results.push(block_result); | ||
} | ||
node_block_results.insert_unique_unchecked(*py_node, block_results); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering of the iteration means that control-flow ops are routed as they happen to be visited - it's not quite ASAP and definitely not ALAP, but it's definitely there. I think that's totally fine for now, I just wanted to flag that there is an implicit decision point in the routing here. It's the same "decision" we make in StochasticSwap
, though.
In the future, we might want to investigate if it'd be better to shift these to be precisely ASAP or ALAP routing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely, good point, there's certainly an implicit decision here. I thought about this and figured that it shouldn't matter for this implementation, since placement of a control flow op doesn't permute the layout. If it did, I imagine it'd be best to go for ALAP, i.e. place all gate sequences that do not have a data dependency on the control flow op in question, since the lookahead would have been driving towards making these gates as close to placeable as possible, and only then place the control flow op.
I've got some notes set aside from when I was planning the work in this PR that might come in handy when we revisit this to look for optimizations.
def empty_dag(node): | ||
out = DAGCircuit() | ||
for qreg in mapped_dag.qregs.values(): | ||
out.add_qreg(qreg) | ||
out.add_clbits(node.cargs) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that some inner blocks may have things like global phases or classical registers (for example, if a control-flow block itself contains a control-flow block with a ClassicalRegister
condition), which this conversion will miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good to know. I was under the impression that inner blocks shouldn't have their own global properties or registers. I suppose we can add the current block as a second arg and copy these things from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocks need to be able to contain registers, but you can safely assume that any registers they do have will also be in any circuits that contain that block. Blocks can technically have global phases (though that's impossible to achieve with the builder interface), but honestly, I don't think we've ever considered that before, so I wouldn't be surprised if we already have global-phase bugs in other parts of the transpiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah alright, so the biggest issue here is that I'm not adding the cregs
from the mapped_dag
, correct? I can also copy in any global phases from the current block as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think cregs
is the big one. We should probably include a test that a nested block that contains a classical register all works correctly as well - if you need to compare, you can use the control-flow builder interface, then make sure that the inner blocks use register conditions rather than bit conditions, and it should create control-flow ops whose blocks involve registers as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 386197d.
- Makes it more explicit that the num_qubits used in a SabreDAG should be the number of physical qubits on the device. - We no longer pass clbit indices because they are always local to the block, i.e. there's no current reason for them to be consistent across the root DAG and inner blocks, like we must do for qubits.
Should be ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, and the new tests with the nested registers look good as well, thanks. I'm super happy with this work, and I'm really pleased to be bringing Sabre back online with control-flow support.
I left one very minor comment, but it's probably me just being paranoid.
We decided in the Qiskit developers' meeting against Kevin spending time right now to add tests of SabreLayout
, when we're relatively confident that it all already works because of the integration testing that's enabled by #10372, and we need all the review effort we can get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks for all the updates and the attention to detail on this PR. Sabre is a key part of the transpiler so that was very important.
(Kevin: feel free to dismiss my comment and just enable automerge if you think I'm being too paranoid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, good timing
In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why Qiskit#10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes Qiskit#10650
* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>
* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> (cherry picked from commit e6c431e)
* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> (cherry picked from commit e6c431e) Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
* Fix performance of Sabre rust<->Python boundary In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why Qiskit#10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes Qiskit#10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>
Summary
Adds support for circuits with control flow operations to the Sabre layout and routing passes.
Details and comments
SabreSwap
The approach taken in this PR aims to add support for control flow with minimal complexity. The most important decision that this advised was the choice to unconditionally route all control flow blocks back to their starting layout. As described in #9419, only "non-exhaustive" control flow operations technically require this. However, by doing it always, we get the following properties:
SabreDAG
data structure relatively simple.SabreLayout
, since placement of control flow ops doesn't permute the layout.As a consequence, unnecessary swaps may be inserted in the case of exhaustive control flow (namely
if_else
s and exhaustiveswitch
s). In the future, we may want to investigate alternative approaches that take advantage of exhaustive control flow. These would likely require more research, since they'd require modifications to the core Sabre algorithm. For example, we might consider mutably sharing the extended set of the outer DAG while routing child blocks with some sort of weighting (e.g. based on branch prediction) so that routing works in the best interest of the overall circuit.The core logic of this implementation has been done in Rust. To make this work,
SabreDAG
was (recursively) extended to include a new fieldnode_blocks
, which maps node IDs of the original DAG to a vector ofSabreDAG
s, which are the blocks of the corresponding control flow node. Ifnode_blocks
contains an entry for a given node, then the Sabre swap Rust code treats the node as a control flow operation, i.e. it'll recursively route its blocks and place it in the gate order immediately.The Rust code now returns a
SabreResult
struct which contains the preexisting swap map (which, as a refresher, maps node IDs from the original DAG to a list of logical swaps that must be performed prior to the corresponding node's placement), the gate order as before, and an additional fieldnode_block_results
, which maps node IDs to a list ofBlockResult
structs for the corresponding control flow node's blocks. EachBlockResult
contains two fields,result
which is aSabreResult
encapsulating the now routed block, andswap_epilogue
, which is a list of logical swaps that must be applied to the block by the caller to bring it to the proper layout (its starting layout, for now).SabreLayout
In theory, the
routing_pass
argument forSabreLayout
should support any routing pass that supports control flow, since the reverse circuit constructed by the Python-side code usesQuantumCircuit.reverse_ops
, which will recursively callreverse_ops
on each instruction. When unspecified, the Rust-side Sabre swap is used, which does not permute the layout when placing control flow operations, and thus does not need to recursively reverse nested circuits.Notes
hashbrown
we depend on in Rust had to be downgraded explicitly in theCargo.lock
. This is because Rustworkx uses itsHashMap
type in methodrustworkx_core::token_swapper::token_swapper
, and Rustworkx is currently usinghashbrown
at0.12.3
. See Consider replacing hashbrown use in core API rustworkx#911 for more details.Testing
Relevant control flow tests from
StochasticSwap
are being ported. I've also locally enabled Sabre layout and routing for circuits with control flow inlevel1.py
and run through thetest_transpiler.py
tests, which are passing.To do
Resolves #9419
Resolves #9421