-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of ConsolidateBlocks pass #7185
Conversation
This commit improves the ConsolidateBlocks pass which is designed to replace a labeled block from Collect2QRuns or CollectMultiQBlocks with a UnitaryGate node representing the unitary of that block. Previously the pass was iterating over the DAG in topological order and tracking every node and looking for predecessors for each blcok as it went and then would assemble a new dag using the data from that initial loop but when it encountered a block it would replace that with a unitary (if it was warranted). However, this was unecessary as the graph structure lets us remove the block and replace it with a node in place. This commit makes this change and simplifies the code significantly and improves the performance of the pass. To accomplish this a new method replace_block_with_op() is added to the DAGCircuit class to perform this inplace replacement. One places we can look at making future performance improvements to this pass are eliminating the use of an intermediate circuit and build an operator directly with a raw numpy array. Fixes Qiskit#7177
The test failures here look like a bug in Collect2QBlocks, it's not collecting a circuit of:
into 3 single gate blocks so when we call consolidate blocks there isn't anything to do. This actually worked before through a side effect/bug of the previous implementation but making it more efficient "fixed" the implementation so it wasn't treating standalone gates as a block by mistake anymore. I'm not sure how best to fix this right now especially because the collect 2q blocks implementation is basically just calling retworkx now. |
In optimization level 0 of the preset pass managers the tests that use the synthesis translation method were failing because lone 1q gates out of basis aren't getting collected into a block by DAGCircuit.collect_2q_runs(). Since none of the implementations we've had for collect_2q_runs() correctly collected a lone 1q gate on a wire into a block this commit adds a new pass Collect1qruns which doesn't have this limitation and use it for level 0 synthesis translation method. Other optimization levels don't have this issue because the optimize 1q decomposition pass will handle the basis translation implicity as part of it's optimization.
I worked around the 1q synthesis translator method issue by adding a new pass collect 1q runs to duplicate the behavior that optimize1qdecomposition is doing for the higher optimization levels. This enables optimization level 0 with the synthesis translation method to handle lone 1q gates which that regression test is checking for to consolidate the 1q gates into a unitary and resynthesize them in the correct basis. It's not the most elegant solution but it was the simplest way I could come up with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collect1qRuns also seems like it has some overlap with optimize_1q_gates. Can it be used there as well?
There's an if statement in |
That was kind of intentional on higher optimization levels the optimize 1q gates pass handles the case this pass was introduced for. I could change the optimization 1q decomposition pass to look for the property set and add collect1qruns before it everywhere, but I'm not sure we want to do that here though (or in general). We really need to refactor how this kind of optimization works and unify it since the 1q and 2q cases are essentially the same just with different decomposers. There is already a tracking issue on this: #5775 |
Yeah, that's to match the behavior of the algorithm that has existed historical in terra for the collect2qblocks pass. That kind of makes sense and this pass is kind of doing split duty. For using collect2qblock, consolidate blocks, unitary synthesis as an optimization routine we don't really want the single node blocks because synthesis won't do a better job than a single node (unless it's an identity matrix) but when using the collect2qblock, consolidate blocks, unitary synthesis chain for basis translation we do need this so that we translate any gates out of the basis set. On the retworkx side I'm not sure it's something we can/want to change the default though because it's potentially a breaking change. I think the best path there might be to add a flag that enables collecting single color blocks then we can expose that as a config option in collect2qblocks and use it for the synthesis translation variant. That would eliminate the need for a collect1qruns pass. But we won't be able to do that in the 0.19.0 release timeframe because it will require a retworkx release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty much all sensible to me. I'm starting to get the feeling we could just search the codebase for the list.index
method - I've never seen a successful use of it that didn't become accidentally quadratic almost immediately!
Collect2qBlocks(), | ||
Collect1qRuns(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we call the 1q one "runs" but the 2q one "blocks"? That seems a little inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno it is the terminology that was here when I started hacking on Qiskit. My guess is it's because the 1q case is a series while the 2q can have parallel gates but that's just a guess. It does seem like a weird distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can call them all blocks
Heh, yeah that's not a bad idea. I think that there is probably a better alternative to using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
qiskit/dagcircuit/dagcircuit.py
Outdated
@@ -977,6 +977,57 @@ def topological_op_nodes(self, key=None): | |||
""" | |||
return (nd for nd in self.topological_nodes(key) if isinstance(nd, DAGOpNode)) | |||
|
|||
def replace_block_with_op(self, node_block, op, wire_pos_map): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this introduce the possibility of creating cycles in the DAG? Let me see if I can draw up an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> import qiskit as qk
>>> qc = qk.QuantumCircuit(2)
>>> qc.cx(0,1)
>>> qc.h(1)
>>> qc.cx(0,1)
>>> dag = qk.converters.circuit_to_dag(qc)
>>> nodes = list(dag.topological_op_nodes())
>>> dag.replace_block_with_op(
[nodes[0], nodes[-1]],
qk.circuit.library.CZGate(),
{bit: idx for (idx, bit) in enumerate(dag.qubits)})
>>> dag.draw()
This came up in a similar but long defunct PR (#3503) which might have some relevant tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me fix this and I'll probably steal the tests from #3503
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this came up part of #3503, we didn't look into it in great depth, but we didn't find a straightforward (performant) solution to the "don't make a cycle while contracting" problem. That's not to say there isn't one (maybe someone with stronger DAG background can comment), but more that "check+raise" style fix isn't necessarily required. e.g. Warning in the docs that the supplied nodes must form a contiguous block, and that you can inadvertently increase the depth of your circuit may be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this came up part of #3503, we didn't look into it in great depth, but we didn't find a straightforward (performant) solution to the "don't make a cycle while contracting" problem. That's not to say there isn't one (maybe someone with stronger DAG background can comment), but more that "check+raise" style fix isn't necessarily required. e.g. Warning in the docs that the supplied nodes must form a contiguous block, and that you can inadvertently increase the depth of your circuit may be sufficient.
Not proposing a solution but just warning: if there is a cycle it is no longer a DAG, and retworkx will not be able to find a topological sort. Collect2qBlocks and Collect1qRuns will exit with an error if you give an input like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contraction of nodes u
, v
in a DAG is not valid iff it exists a path of length > 1 connecting u
, v
. This is a simple check to do, but i agree with @kdk's last point: here we can skip it and just leave a comment in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looking at this locally doing the checking can hurt performance. I tested both the path length check and just adding is_directed_acyclic_graph()
after the operations. Adding a cycle check at the end makes this ~10x slower in my testing while doing the distance approach is ~100x slower (there might be room for improvement here because I just called dijkstra's in for loop). The tradeoff is that the distance approach is much cleaner code as it's an upfront check on the inputs.
After all this I agree with @georgios-ts and @kdk and think that documenting this is probably enough. I'm thinking for this case though making the check an opt-out documenting the tradeoff between performance and validation and then for consolidate blocks we explicitly opt-out to avoid the overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in: aea6545
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A plain dijkstra in the original DAG would not actually work here because if we contract a sequence of nodes u_i
we should not follow edges with both endpoints in the sequence while searching for a path. As an improved solution, I can't help but advertise the pending BFS/DFS PR's in retworkx that might come in handy here and avoid the need to modify the original DAG. In any case, let's stick with is_directed_acyclic_graph()
check for now and we can revisit this.
This commit adds opt-out cycle validation to the replace_block_with_op method. By default the method will check the input node_block if contracting the dag over those nodes will introduce a cycle. At the same time a new kwarg, cycle_check, is added to the replace_block_with_op method which can be set to False to disable this check (as it ends up being quite expensive in runtime). The consolidate blocks pass is then updated to set this new kwarg to false to avoid the overhead as the check isn't necessary there as the block collection won't introduce a cycle when the block is consolidated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates here, one comment but this LGTM otherwise.
Co-authored-by: Kevin Krsulich <kevin@krsulich.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
Summary
This commit improves the ConsolidateBlocks pass which is designed to
replace a labeled block from Collect2QRuns or CollectMultiQBlocks with a
UnitaryGate node representing the unitary of that block. Previously the
pass was iterating over the DAG in topological order and tracking every
node and looking for predecessors for each blcok as it went and then
would assemble a new dag using the data from that initial loop but when
it encountered a block it would replace that with a unitary (if it was
warranted). However, this was unnecessary as the graph structure lets us
remove the block and replace it with a node in place. This commit makes
this change and simplifies the code significantly and improves the
performance of the pass. To accomplish this a new method
replace_block_with_op() is added to the DAGCircuit class to perform this
inplace replacement.
One places we can look at making future performance improvements to
this pass are eliminating the use of an intermediate circuit and build
an operator directly with a raw numpy array.
Details and comments
Fixes #7177