-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidize UnitarySynthesis
pass
#13141
Conversation
479fbe0
to
e4ec3ff
Compare
8de52b5
to
e11ffe6
Compare
e11ffe6
to
1296234
Compare
Pull Request Test Coverage Report for Build 11409692028Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Performance is starting to look promising:
|
Benchmarking status as of eb815d1 (more benchmarks show improvement but the improvement looks slightly smaller).
|
One or more of the following people are relevant to this code:
|
UnitarySynthesis
passUnitarySynthesis
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a first pass through this, thanks for doing this @ElePT! I left some inline comments. The two high level ones that stuck out the most to me from my first pass is that this PR seems to make a lot of things public, some of them make sense like the dagcircuit methods, but others I'm skeptical we need to be reusing or accessing outside of their definition. Some of the internal attributes of structures for example.
The other thing that I think is potentially a bigger issue especially for performance is around some of the typing choices and how that leads to a lot of conversions. The most concrete example is things are normalized on DAGCircuit
between all the different decomposer, but this leads to creating an expensive object to be constructed when it'd be far more efficient to iterate over the synthesized sequence directly and add the gates directly to the dag.
let circuit_to_dag = imports::CIRCUIT_TO_DAG.get_bound(py); | ||
let dag_to_circuit = imports::DAG_TO_CIRCUIT.get_bound(py); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These exist in rust now, see: https://github.com/Qiskit/qiskit/blob/main/crates/circuit/src/converters.rs so I'd suggest calling them directly. Especially if you're doing circuit_to_dag
you can call the inner DAGCircuit
constructor and avoid the python overhead when you've built a CircuitData
directly in rust (not that I think that's the case here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any updates on when we will start using these here? I understand that you may not want to use dag_to_circuit
in rust to avoid using CircuitData
but circuit_to_dag
takes a QuantumCircuit
instance, and should work the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code to use circuit_to_dag
from the new converters crate in a912248. I am hesitant with dag_to_circuit
because to use it I'd have to add a few extra conversion steps.
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
e560bf8
to
7498419
Compare
…unitary-synthesis
f2673eb
to
9cacf6f
Compare
This is the latest benchmarking output after pulling from main, again a smaller number of tests improve but the improvement seems better.
|
I also ran the transpiler benchmarks for QV, which might give a better idea of the speedup than the utility scale ones, and got the following numbers on the tests that use the rust path (the rust path is only followed if
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is about ready in my opinion. We'll see about the panicking and errror handling on the synth_sequences
based on the comments I just left. I'd appreciate if @ShellyGarion or @alexanderivrii can take a look at the arithmetic, or a second look from the team.
)); | ||
} | ||
|
||
let target_basis_set = get_target_basis_set(target, qubits[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should something like this be part of #13308? Since we're moving towards a more Target
centric transpiler, being able to extract the basis gates automatically. I see this being done in many transpiler passes whenever basis gates are not provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely, I would include this.
let mut embodiment = | ||
xx_embodiments.get_item(op.to_object(py).getattr(py, "base_class")?)?; | ||
|
||
if embodiment.getattr("parameters")?.len()? == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wanted to ask since CircuitData
has a version of assign_parameters
that can be worked entirely through rust.
let synth = if let DecomposerType::TwoQubitBasisDecomposer(decomp) = &decomposer_2q.decomposer { | ||
decomp.call_inner(su4_mat.view(), None, is_approximate, None)? | ||
} else { | ||
panic!("synth_su4_sequence should only be called for TwoQubitBasisDecomposer.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's unreachable code you should use unreachable!()
, the only downside is that these exceptions cannot really be caught from Python code. However, I think it is better for debugging purposes.
The general framework looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't made it through all of the rust code yet, but from what I seen this is looking good enough to merge (once the build failure is fixed). I think there are some potential optimizations and improvements we'll want to make down the road but this is large enough we could nitpick forever on particular pieces of it and never merge it when we can make incremental improvements after it merges. So if I don't circle back to this in a timely manner don't let that stop someone else from feeling empowered to approve and merge it.
// no need to bother trying the XXDecomposer. | ||
static GOODBYE_SET: [&str; 3] = ["cx", "cz", "ecr"]; | ||
|
||
fn get_target_basis_set(target: &Target, qubit: PhysicalQubit) -> EulerBasisSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is value in eventually consolidating this with the similar logic in euler_one_qubit_decomposer
. It's different enough that it's not worth it right now, but I think there is a path to unifying the functions probably in a follow up PR.
target: &Target, | ||
) -> f64 { | ||
let mut gate_fidelities = Vec::new(); | ||
let mut score_instruction = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why did you end up going for a closure here? I probably would have just put this in the loop body directly. It doesn't really matter so no reason to change it, I was just wondering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was originally inlining it in the code, switched a couple of times between that and the function, and it ended up like this, but no proper reason for it.
let node_ids: Vec<NodeIndex> = dag.op_nodes(false).collect(); | ||
for node in node_ids { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to do this as a separate loop over all the nodes? I feel like we should be able to integrate this into the main loop over the nodes in topological order below and reduce the need to iterate over the dag twice.
If not we can do a simple check up front and call dag.has_control_flow()
to check if we have any control flow in the dag at all and if not we don't have to bother iterating.
let new_ids = dag.get_qargs(inst.qubits).iter().map(|qarg| { | ||
qubit_indices | ||
.get_item(qarg.0 as usize) | ||
.expect("Unexpected index error in DAG") | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need to be using a PyList
for this? I feel like using a vec here makes more sense. There is a conversion cost coming from python but if we're calling this function recursively from rust or repeatedly accessing it optimizing for that will end up being a lot more efficient.
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
Currently reviewing... Any small changes I'll commit myself and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your hard work on this. Hopefully, we can make the interners private again in the future (hopefully #13335 is sufficient). I am excited about all the speedups this PR introduces :)
let res = py_run_main_loop( | ||
py, | ||
&mut circuit_to_dag( | ||
py, | ||
QuantumCircuitData::extract_bound(&raw_block?)?, | ||
false, | ||
None, | ||
None, | ||
)?, | ||
new_ids, | ||
min_qubits, | ||
target, | ||
coupling_edges, | ||
approximation_degree, | ||
natural_direction, | ||
)?; | ||
new_blocks.push(dag_to_circuit.call1((res,))?); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this in a follow-up but we should consider adding a variant of this that also works with QuantumCircuit
or CircuitData
preferably. And so we avoid these two conversions.
out_dag.apply_operation_back( | ||
py, | ||
gate.into(), | ||
&[qubit], | ||
&[], | ||
Some(new_params), | ||
ExtraInstructionAttributes::default(), | ||
#[cfg(feature = "cache_pygates")] | ||
None, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the follow-up we should consider using the stuff that will be added by #13335
- Leverage usage of new methods in `UnitarySynthesis` after Qiskit#13141 merged.
This commit fixes a performance regression that was introduced in PR Qiskit#13141. When the pass is looking up the preferred synthesis direction for a unitary based on the connectvity constraints the connectivity was being provided as a PyList. To look up the edge in connectivity set this meant we needed to iterate over the list and then create a set that rust could lookup if it contains an edge or it's reverse. This has significant overhead because its iterating via python and also iterating per decomposition. This commit addresses this by changing the input type to be a HashSet from Python so Pyo3 will convert a pyset directly to a HashSet once at call time and that's used by reference for lookups directly instead of needing to iterate over the list each time.
This commit fixes a performance regression that was introduced in PR Qiskit#13141. When the pass is looking up the preferred synthesis direction for a unitary based on the connectvity constraints the connectivity was being provided as a PyList. To look up the edge in connectivity set this meant we needed to iterate over the list and then create a set that rust could lookup if it contains an edge or it's reverse. This has significant overhead because its iterating via python and also iterating per decomposition. This commit addresses this by changing the input type to be a HashSet from Python so Pyo3 will convert a pyset directly to a HashSet once at call time and that's used by reference for lookups directly instead of needing to iterate over the list each time.
* Fix performance regression in UnitarySynthesis This commit fixes a performance regression that was introduced in PR #13141. When the pass is looking up the preferred synthesis direction for a unitary based on the connectvity constraints the connectivity was being provided as a PyList. To look up the edge in connectivity set this meant we needed to iterate over the list and then create a set that rust could lookup if it contains an edge or it's reverse. This has significant overhead because its iterating via python and also iterating per decomposition. This commit addresses this by changing the input type to be a HashSet from Python so Pyo3 will convert a pyset directly to a HashSet once at call time and that's used by reference for lookups directly instead of needing to iterate over the list each time. * Avoid constructing plugin data views with default plugin If we're using the default plugin the execution will happen in rust and we don't need to build the plugin data views that are defined in the plugin interface. Profiling the benchpress test using the hamlib hamiltonian: ham_graph-2D-grid-nonpbc-qubitnodes_Lx-5_Ly-186_h-0.5-all-to-all that caught this originally regression was showing an inordinate amount of time being spent in the construction of `_build_gate_lengths_by_qubit` and `_build_gate_errors_by_qubit` which isn't being used because this happens internally in rust now. To mitigate this overhead this commit migrates the sections computing these values to the code branch that uses them and not the default plugin branch that uses rust.
Summary
Closes #12210. This is an initial effort to port UnitarySynthesis to Rust, currently limited to the target + default plugin path (custom plugins will run in Python).
Details and comments
TODO:
dag_to_circuit
(not reflected in unit tests)dag.push_back
and see if it can be done with the newdag.apply_operation_back
method