-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidize two qubit basis decomposer #12010
Oxidize two qubit basis decomposer #12010
Conversation
One or more of the the following people are requested to review this:
|
For some initial numbers I ran the same asv benchmarks that I did in #11946 (comment) to compare this PR against the current state of main (which doesn't have #11946 yet)
|
This commit is the second part of migrating the default 2q unitary synthesis method to leverage parallel rust as described in Qiskit#8774. The Eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a single call from the `UnitarySynthesis` pass. The TwoQubitBasisDecomposer class is one of the default decomposers used by the unitary synthesis plugin. After this we can build an interface that will run the decomposition in parallel for a given decomposer. This commit re-implements the TwoQubitBasisDecomposer class in rust. It keeps the same algorithm from the previous python version but implements it in rust. This builds off of Qiskit#11946 and for the operation of the decomposer class the TwoQubitWeylDecomposition class is used solely through rust. This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946 is merged. Fixes Qiskit#12004
627269b
to
ad70122
Compare
One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 8425179161Details
💛 - Coveralls |
This should be ready to go now. There might be some more performance tuning and optimization we can do to the code (please feel free to leave any suggestions inline) but the performance is looking really good in it's current form. I reran the benchmarks from above against current main (which includes #11946 now) and it resulted in:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good to me, at least from a Rust perspective.
The performance improvements are fantastic! 😄
As keen as I am to see this merge I've pulled it from the merge queue just to give @levbishop a chance to review it too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally fine as a translation of the python.
I do worry that it further locks in some possibly non-optimal choices from the python. In particular forcing an explicit 1q decomposition at 2q decomposition level is probably not the best overall strategy for the future (since it will almost certainly be immediately converted back to a unitary matrix and collected, resynthesized by the 1q optimizations)
Tracking the num_basis_gates()
flow through the code is a bit of a maze. Python num_basis_gates
calls rust num_basis_gates
callls _num_basis_gates
calls __num_basis_gates
I'm interested to try removing the layer of wrapping of the rust object inside a python object, but i think we can explore that in future PRs without breaking any API or anything.
I do very much like the perf delta from this work!
Yeah I think there is a round of clean up we'll want to do to simplify things. As you say I think that'll be fairly straightforward once this merges without any api implications. Porting the algorithm was large and kind of unwieldy that I wasn't super concerned with the exact interface thinking we could iterate on it later, especially because the rust code is explicitly private. |
This commit tweaks the heuristic effort in optimization level 2 to be more of a middle ground between level 1 and 3; with a better balance between output quality and runtime. This places it to be a better default for a pass manager we use if one isn't specified. The tradeoff here is that the vf2layout and vf2postlayout search space is reduced to be the same as level 1. There are diminishing margins of return on the vf2 layout search especially for cases when there are a large number of qubit permutations for the mapping found. Then the number of sabre trials is brought up to the same level as optimization level 3. As this can have a significant impact on output and the extra runtime cost is minimal. The larger change is that the optimization passes from level 3. This ends up mainly being 2q peephole optimization. With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the follow-on PRs this is now fast enough to rely on in optimization level 2.
This commit tweaks the heuristic effort in optimization level 2 to be more of a middle ground between level 1 and 3; with a better balance between output quality and runtime. This places it to be a better default for a pass manager we use if one isn't specified. The tradeoff here is that the vf2layout and vf2postlayout search space is reduced to be the same as level 1. There are diminishing margins of return on the vf2 layout search especially for cases when there are a large number of qubit permutations for the mapping found. Then the number of sabre trials is brought up to the same level as optimization level 3. As this can have a significant impact on output and the extra runtime cost is minimal. The larger change is that the optimization passes from level 3. This ends up mainly being 2q peephole optimization. With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the follow-on PRs this is now fast enough to rely on in optimization level 2.
* Increase heuristic effort for optimization level 2 This commit tweaks the heuristic effort in optimization level 2 to be more of a middle ground between level 1 and 3; with a better balance between output quality and runtime. This places it to be a better default for a pass manager we use if one isn't specified. The tradeoff here is that the vf2layout and vf2postlayout search space is reduced to be the same as level 1. There are diminishing margins of return on the vf2 layout search especially for cases when there are a large number of qubit permutations for the mapping found. Then the number of sabre trials is brought up to the same level as optimization level 3. As this can have a significant impact on output and the extra runtime cost is minimal. The larger change is that the optimization passes from level 3. This ends up mainly being 2q peephole optimization. With the performance improvements from #12010 and #11946 and all the follow-on PRs this is now fast enough to rely on in optimization level 2. * Add test workaround from level 3 to level 2 too * Expand vf2 call limit on VF2Layout For the initial VF2Layout call this commit expands the vf2 call limit back to the previous level instead of reducing it to the same as level 1. The idea behind making this change is that spending up to 10s to find a perfect layout is a worthwhile tradeoff as that will greatly improve the result from execution. But scoring multiple layouts to find the lowest error rate subgraph has a diminishing margin of return in most cases as there typically aren't thousands of unique subgraphs and often when we hit the scoring limit it's just permuting the qubits inside a subgraph which doesn't provide the most value. For VF2PostLayout the lower call limits from level 1 is still used. This is because both the search for isomorphic subgraphs is typically much shorter with the vf2++ node ordering heuristic so we don't need to spend as much time looking for alternative subgraphs. * Move 2q peephole outside of optimization loop in O2 Due to potential instability in the 2q peephole optimization we run we were using the `MinimumPoint` pass to provide backtracking when we reach a local minimum. However, this pass adds a significant amount of overhead because it deep copies the circuit at every iteration of the optimization loop that improves the output quality. This commit tweaks the O2 pass manager construction to only run 2q peephole once, and then updates the optimization loop to be what the previous O2 optimization loop was.
Summary
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in #8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the
UnitarySynthesis
pass.The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.
This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of #11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.
This commit depends on #11946 and will need to be rebased after #11946
is merged.
Details and comments
Fixes #12004
TODO: