-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConsolidateBlocks
does not have a good logic for heterogeneous gates
#11659
Comments
I like this idea in general. I'm thinking of how it relates to #8774 (specifically the #12007 sub-task) and we can sidestep the need to add a batch mode to the unitary synthesis plugin interface by doing this all at once in multithreaded rust in a new pass. The only question I have though is in evaluating the error for the original 2q block. I agree that we should use an estimated error heuristic to evaluate a potential decompositions and select one based on that instead of the number of gates (which is just being used a proxy for estimated error rate). But prior to synthesis there isn't a guarantee that the gates in a 2q block are in target instructions that we can query error rates on. How were you thinking we'd evaluate the block in these cases? Because I was reading this is as we we compare the error estimates for the original circuit against all the possible decompositions and pick the one which results in the lower error. I guess the answer is if the block isn't in target native instructions we always need to synthesize so in those cases we pick the lowest error decomposition? |
This commit adds a new transpiler pass for 2q peephole optimization that is designed to replace the use of `Collect2qBlocks`, `ConsolidateBlocks`, and `UnitarySynthesis` in the optimization loop of the transpiler with a new optimized pass Optimize2qBlocks that performs the same basic functionality. The goal of this new pass is to be more efficient in runtime and also enable better quality output. The runtime improvements are achieved by only crossing the python<->rust boundary once and doing all the heavy lifting in rust and then just returning a list of circuit sequences for all 2q blocks and then performing inline substitution for all of those circuits. The actual computation is then potentially executed in parallel using rust multithreading. The potential quality improvement is caused by changing the decomposition selection to be based on projected error rates instead of an estimated number of 2q basis gates from the decomposition. In the previous triplet we skipped synthesis if the estimated number of 2q gates from the default decomposer was greater than or equal to the 2q gates in the block which was an attempt to estimate the error rate. In this new pass we compare the estimated fidelity of all the provided synthesis methods and select the lowest noise decomposition. Fixes: Qiskit#11659 Fixes: Qiskit#12007
This commit adds a new transpiler pass for 2q peephole optimization that is designed to replace the use of `Collect2qBlocks`, `ConsolidateBlocks`, and `UnitarySynthesis` in the optimization loop of the transpiler with a new optimized pass Optimize2qBlocks that performs the same basic functionality. The goal of this new pass is to be more efficient in runtime and also enable better quality output. The runtime improvements are achieved by only crossing the python<->rust boundary once and doing all the heavy lifting in rust and then just returning a list of circuit sequences for all 2q blocks and then performing inline substitution for all of those circuits. The actual computation is then potentially executed in parallel using rust multithreading. The potential quality improvement is caused by changing the decomposition selection to be based on projected error rates instead of an estimated number of 2q basis gates from the decomposition. In the previous triplet we skipped synthesis if the estimated number of 2q gates from the default decomposer was greater than or equal to the 2q gates in the block which was an attempt to estimate the error rate. In this new pass we compare the estimated fidelity of all the provided synthesis methods and select the lowest noise decomposition. Fixes: Qiskit#11659 Fixes: Qiskit#12007
This commit adds a new transpiler pass for physical optimization, TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks, ConsolidateBlocks, and UnitarySynthesis in the optimization stage for a default pass manager setup. The pass logically works the same way where it analyzes the dag to get a list of 2q runs, calculates the matrix of each run, and then synthesizes the matrix and substitutes it inplace. The distinction this pass makes though is it does this all in a single pass and also parallelizes the matrix calculation and synthesis steps because there is no data dependency there. This new pass is not meant to fully replace the Collect2qBlocks, ConsolidateBlocks, or UnitarySynthesis passes as those also run in contexts where we don't have a physical circuit. This is meant instead to replace their usage in the optimization stage only. Accordingly this new pass also changes the logic on how we select the synthesis to use and when to make a substituion. Previously this logic was primarily done via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if the number of basis gates needed based on the weyl chamber coordinates was less than the number of 2q gates in the block (see Qiskit#11659 for discussion on this). Since this new pass skips the explicit consolidation stage we go ahead and try all the available synthesizers Right now this commit has a number of limitations, the largest are: - Doesn't support builds with the py-cache feature (`OnceCell` for the cache can't be used across threads) - Only supports the target - It doesn't support any synthesizers besides the TwoQubitBasisDecomposer, because it's the only one in rust currently. For plugin handling I left the logic as running the three pass series, but I'm not sure this is the behavior we want. We could say keep the synthesis plugins for `UnitarySynthesis` only and then rely on our built-in methods for physical optimiztion only. But this also seems less than ideal because the plugin mechanism is how we support synthesizing to custom basis gates, and also more advanced approximate synthesis methods. Both of those are things we need to do as part of the synthesis here. Additionally, this is currently missing tests and documentation and while running it manually "works" as in it returns a circuit that looks valid, I've not done any validation yet. This also likely will need several rounds of performance optimization and tuning. t this point this is just a rough proof of concept and will need a lof refinement along with larger changes to Qiskit's rust code before this is ready to merge. Fixes Qiskit#12007 Fixes Qiskit#11659
Right now when we're running the consolidate blocks pass during the init stage we were not consolidating all 2q blocks found. This was because the pass was not being called with the target's information and it wasn't aware when the gates in a block were outside of the target. To avoid potentially unecessary work the ConsolidateBlocks pass has some logic to avoid collecting into a block if it doesn't think that the unitary decomposition can decrease the 2q gate count, basically trying to avoid the case where the synthesis output would be more expensive than original circuit gates. However if the gates are outside of the target it can't estimate the synthesis cost and should be consolidating the block. The logic for this exists in the pass, but it only works if you specify the target so it knows that there are gates in the block outside the target. As part of this a bug around handling gates outside basis in the pass was fixed. If gates were encountered that had no matrix defined the pass would previously have errored trying to create a unitary gate for the block in some cases. In general the logic around this pass is lacking in several cases (see issue Qiskit#11659) and it would be better to estimate the error of the block and determine if the estimated error from the output of different synthesis alternatives is better or not. This is easy to do if we do consolidation and synthesis in a single step. But for this situation of running consolidation during init stage and synthesis much later during the translation stage this is hard to accomplish because we lose the context of the original block if we consolidate. That is why we should consider changing the init stage's use of `ConsolidateBlocks` to set `force_consolidate=True` which will create a `UnitaryGate` out of all 2q blocks found regardless of the estimated number of KAK gates needed based on the unitary matrix's coordinates in the weyl chamber. Doing this would fix issues like Qiskit#13344 because we'll synthesisze the unitary in that case. Where it's less clear we want to do this though is cases where the target has different 2q gates than the KAK gate synthesis knows how to work with. Take the example of Qiskit#13344, if the target had global supported gates of `[cp, u, cx]` or something like that and the angle was not zero then forcing consolidation would change a single cp into possibly 3 cx gates. So this commit does not set `force_consolidate=True` and we can discuss the best way to handle that in a separate commit (either on this PR or a new one).
This commit ports the consolidate blocks pass to rust. The logic remains the same and this is just a straight porting. One optimization is that to remove the amount of python processing the Collect2qBlocks pass is no longer run as part of the preset pass managers and this is called directly in rust. This speeds up the pass because it avoids 3 crossing of the language boundary and also the intermediate creation of DAGNode python objects. The pass still supports running with Collect2qBlocks for backwards compatibility and will skip running the pass equivalent internally the field is present in the property set. There are potential improvements that can be investigated here such as avoiding in place dag contraction and moving to rebuilding the dag iteratively. Also changing the logic around estimated error (see Qiskit#11659) to be more robust. But these can be left for follow up PRs as they change the logic. Realistically we should look at combining ConsolidateBlocks for it's current two usages with Split2qUnitaries and UnitarySynthesis into those passes for more efficiency. We can improve the performance and logic as part of that refactor. See Qiskit#12007 for more details on this for UnitarySynthesis. Closes Qiskit#12250
* Oxidize the ConsolidateBlocks pass This commit ports the consolidate blocks pass to rust. The logic remains the same and this is just a straight porting. One optimization is that to remove the amount of python processing the Collect2qBlocks pass is no longer run as part of the preset pass managers and this is called directly in rust. This speeds up the pass because it avoids 3 crossing of the language boundary and also the intermediate creation of DAGNode python objects. The pass still supports running with Collect2qBlocks for backwards compatibility and will skip running the pass equivalent internally the field is present in the property set. There are potential improvements that can be investigated here such as avoiding in place dag contraction and moving to rebuilding the dag iteratively. Also changing the logic around estimated error (see #11659) to be more robust. But these can be left for follow up PRs as they change the logic. Realistically we should look at combining ConsolidateBlocks for it's current two usages with Split2qUnitaries and UnitarySynthesis into those passes for more efficiency. We can improve the performance and logic as part of that refactor. See #12007 for more details on this for UnitarySynthesis. Closes #12250 * Update test to count consolidate_blocks instead of collect_2q_blocks * Fix lint * Fix solovay kitaev test * Add release note * Restore 2q block collection for synthesis translation plugin * Add rust native substitute method * Fix final test failures * Remove release note and test change The test failure fixed by a test change was incorrect and masked a logic bug that was fixed in a subsequent commit. This commit reverts that change to the test and removes the release note attempting to document a fix for a bug that only existed during development of this PR. * Fix comment leftover from rust-analyzer * Remove unused code * Simplify control flow handling * Remove unnecessary clone from substitute_node * Preallocate block op names in replace_block_with_py_op * Remove more unused imports * Optimize linalg in block collection This commit reworks the logic to reduce the number of Kronecker products and 2q matrix multiplications we do as part of computing the unitary of the block. It now computes the 1q components individually with 1q matrix multiplications and only calls kron() and a 2q matmul when a 2q gate is encountered. This reduces the number of more expensive operations we need to perform and replaces them with a much faster 1q matmul. * Use static one qubit identity matrix * Remove unnecessary lifetime annotations * Add missing docstring to new rust method * Apply suggestions from code review Co-authored-by: Kevin Hartman <kevin@hart.mn> * Fix lint * Add comment for MAX_2Q_DEPTH constant * Reuse block_qargs for each block Co-authored-by: Henry Zou <87874865+henryzou50@users.noreply.github.com> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> Co-authored-by: Henry Zou <87874865+henryzou50@users.noreply.github.com>
This commit adds a new transpiler pass for physical optimization, TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks, ConsolidateBlocks, and UnitarySynthesis in the optimization stage for a default pass manager setup. The pass logically works the same way where it analyzes the dag to get a list of 2q runs, calculates the matrix of each run, and then synthesizes the matrix and substitutes it inplace. The distinction this pass makes though is it does this all in a single pass and also parallelizes the matrix calculation and synthesis steps because there is no data dependency there. This new pass is not meant to fully replace the Collect2qBlocks, ConsolidateBlocks, or UnitarySynthesis passes as those also run in contexts where we don't have a physical circuit. This is meant instead to replace their usage in the optimization stage only. Accordingly this new pass also changes the logic on how we select the synthesis to use and when to make a substituion. Previously this logic was primarily done via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if the number of basis gates needed based on the weyl chamber coordinates was less than the number of 2q gates in the block (see Qiskit#11659 for discussion on this). Since this new pass skips the explicit consolidation stage we go ahead and try all the available synthesizers Right now this commit has a number of limitations, the largest are: - Only supports the target - It doesn't support any synthesizers besides the TwoQubitBasisDecomposer, because it's the only one in rust currently. For plugin handling I left the logic as running the three pass series, but I'm not sure this is the behavior we want. We could say keep the synthesis plugins for `UnitarySynthesis` only and then rely on our built-in methods for physical optimiztion only. But this also seems less than ideal because the plugin mechanism is how we support synthesizing to custom basis gates, and also more advanced approximate synthesis methods. Both of those are things we need to do as part of the synthesis here. Additionally, this is currently missing tests and documentation and while running it manually "works" as in it returns a circuit that looks valid, I've not done any validation yet. This also likely will need several rounds of performance optimization and tuning. t this point this is just a rough proof of concept and will need a lof refinement along with larger changes to Qiskit's rust code before this is ready to merge. Fixes Qiskit#12007 Fixes Qiskit#11659
This commit adds a new transpiler pass for physical optimization, TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks, ConsolidateBlocks, and UnitarySynthesis in the optimization stage for a default pass manager setup. The pass logically works the same way where it analyzes the dag to get a list of 2q runs, calculates the matrix of each run, and then synthesizes the matrix and substitutes it inplace. The distinction this pass makes though is it does this all in a single pass and also parallelizes the matrix calculation and synthesis steps because there is no data dependency there. This new pass is not meant to fully replace the Collect2qBlocks, ConsolidateBlocks, or UnitarySynthesis passes as those also run in contexts where we don't have a physical circuit. This is meant instead to replace their usage in the optimization stage only. Accordingly this new pass also changes the logic on how we select the synthesis to use and when to make a substituion. Previously this logic was primarily done via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if the number of basis gates needed based on the weyl chamber coordinates was less than the number of 2q gates in the block (see Qiskit#11659 for discussion on this). Since this new pass skips the explicit consolidation stage we go ahead and try all the available synthesizers Right now this commit has a number of limitations, the largest are: - Only supports the target - It doesn't support any synthesizers besides the TwoQubitBasisDecomposer, because it's the only one in rust currently. For plugin handling I left the logic as running the three pass series, but I'm not sure this is the behavior we want. We could say keep the synthesis plugins for `UnitarySynthesis` only and then rely on our built-in methods for physical optimiztion only. But this also seems less than ideal because the plugin mechanism is how we support synthesizing to custom basis gates, and also more advanced approximate synthesis methods. Both of those are things we need to do as part of the synthesis here. Additionally, this is currently missing tests and documentation and while running it manually "works" as in it returns a circuit that looks valid, I've not done any validation yet. This also likely will need several rounds of performance optimization and tuning. t this point this is just a rough proof of concept and will need a lof refinement along with larger changes to Qiskit's rust code before this is ready to merge. Fixes Qiskit#12007 Fixes Qiskit#11659
What should we add?
ConsolidateBlocks
has some logic for choosing whether to collapse some blocks into a UnitaryGate. But this is pretty outdated by now. It basically checks whether number of gates in the decomposition improves. First, number of gates is not necessarily important, but rather the error is. Second, it does not currently deal with multiple (heterogeneous) possible decompositions.But all of this is implemented correctly in UnitarySynthesis (at least for 2q blocks). So ConsolidateBlocks should just defer to UnitarySynthesis for when and how to resynthesize a sequence of 2q gates. All of its decomposition considerations should come from UnitarySynthesis.
I think it is better to write a new pass
PeepholeUnitaryResynthesis
, which does all 3 of these actions:Collect2QBlocks
,ConsolidateBlocks
,UnitarySynthesis
. The logic must be consistent, so there's no point splitting these 3 stages.I believe this can replace the
UnitarySynthesis
pass because any Unitary can be considered a simple peephole unitary.(** note: currently if the user knows that there's a good chance that UnitarySynthesis improves the circuit, they can force it to occur by adding
[Collect2QBlocks(target=target), ConsolidateBlocks(force_consolidate=True), UnitarySynthesis(target=target)]
to the passmanager, so it is possible to customize this by a user who knows how to use the passmanager)The text was updated successfully, but these errors were encountered: