Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of ConsolidateBlocks pass #7185

Merged
merged 14 commits into from
Nov 1, 2021

Conversation

mtreinish
Copy link
Member

Summary

This commit improves the ConsolidateBlocks pass which is designed to
replace a labeled block from Collect2QRuns or CollectMultiQBlocks with a
UnitaryGate node representing the unitary of that block. Previously the
pass was iterating over the DAG in topological order and tracking every
node and looking for predecessors for each blcok as it went and then
would assemble a new dag using the data from that initial loop but when
it encountered a block it would replace that with a unitary (if it was
warranted). However, this was unnecessary as the graph structure lets us
remove the block and replace it with a node in place. This commit makes
this change and simplifies the code significantly and improves the
performance of the pass. To accomplish this a new method
replace_block_with_op() is added to the DAGCircuit class to perform this
inplace replacement.

One places we can look at making future performance improvements to
this pass are eliminating the use of an intermediate circuit and build
an operator directly with a raw numpy array.

Details and comments

Fixes #7177

This commit improves the ConsolidateBlocks pass which is designed to
replace a labeled block from Collect2QRuns or CollectMultiQBlocks with a
UnitaryGate node representing the unitary of that block. Previously the
pass was iterating over the DAG in topological order and tracking every
node and looking for predecessors for each blcok as it went and then
would assemble a new dag using the data from that initial loop but when
it encountered a block it would replace that with a unitary (if it was
warranted). However, this was unecessary as the graph structure lets us
remove the block and replace it with a node in place. This commit makes
this change and simplifies the code significantly and improves the
performance of the pass. To accomplish this a new method
replace_block_with_op() is added to the DAGCircuit class to perform this
inplace replacement.

One places we can look at making future performance improvements to
this pass are eliminating the use of an intermediate circuit and build
an operator directly with a raw numpy array.

Fixes Qiskit#7177
@mtreinish mtreinish added performance Changelog: New Feature Include in the "Added" section of the changelog labels Oct 26, 2021
@mtreinish mtreinish requested a review from ewinston October 26, 2021 11:47
@mtreinish mtreinish requested a review from a team as a code owner October 26, 2021 11:47
@ewinston
Copy link
Contributor

The scaling looks good now,

image

@mtreinish
Copy link
Member Author

The test failures here look like a bug in Collect2QBlocks, it's not collecting a circuit of:

     ┌───┐
q_0: ┤ H ├
     ├───┤
q_1: ┤ H ├
     ├───┤
q_2: ┤ H ├
     └───┘

into 3 single gate blocks so when we call consolidate blocks there isn't anything to do. This actually worked before through a side effect/bug of the previous implementation but making it more efficient "fixed" the implementation so it wasn't treating standalone gates as a block by mistake anymore. I'm not sure how best to fix this right now especially because the collect 2q blocks implementation is basically just calling retworkx now.

In optimization level 0 of the preset pass managers the tests that
use the synthesis translation method were failing because lone 1q gates
out of basis aren't getting collected into a block by
DAGCircuit.collect_2q_runs(). Since none of the implementations we've
had for collect_2q_runs() correctly collected a lone 1q gate on a wire
into a block this commit adds a new pass Collect1qruns which doesn't
have this limitation and use it for level 0 synthesis translation
method. Other optimization levels don't have this issue because the
optimize 1q decomposition pass will handle the basis translation
implicity as part of it's optimization.
@mtreinish
Copy link
Member Author

I worked around the 1q synthesis translator method issue by adding a new pass collect 1q runs to duplicate the behavior that optimize1qdecomposition is doing for the higher optimization levels. This enables optimization level 0 with the synthesis translation method to handle lone 1q gates which that regression test is checking for to consolidate the 1q gates into a unitary and resynthesize them in the correct basis. It's not the most elegant solution but it was the simplest way I could come up with.

Copy link
Contributor

@ewinston ewinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collect1qRuns also seems like it has some overlap with optimize_1q_gates. Can it be used there as well?

@IvanIsCoding
Copy link
Contributor

The test failures here look like a bug in Collect2QBlocks, it's not collecting a circuit of:

     ┌───┐
q_0: ┤ H ├
     ├───┤
q_1: ┤ H ├
     ├───┤
q_2: ┤ H ├
     └───┘

into 3 single gate blocks so when we call consolidate blocks there isn't anything to do. This actually worked before through a side effect/bug of the previous implementation but making it more efficient "fixed" the implementation so it wasn't treating standalone gates as a block by mistake anymore. I'm not sure how best to fix this right now especially because the collect 2q blocks implementation is basically just calling retworkx now.

There's an if statement in retworkx that removes single color blocks, that is why. We can try removing the if, but in the past there were multiple tests in terra failing if we did that.

@mtreinish
Copy link
Member Author

Collect1qRuns also seems like it has some overlap with optimize_1q_gates. Can it be used there as well?

That was kind of intentional on higher optimization levels the optimize 1q gates pass handles the case this pass was introduced for. I could change the optimization 1q decomposition pass to look for the property set and add collect1qruns before it everywhere, but I'm not sure we want to do that here though (or in general). We really need to refactor how this kind of optimization works and unify it since the 1q and 2q cases are essentially the same just with different decomposers. There is already a tracking issue on this: #5775

@mtreinish
Copy link
Member Author

The test failures here look like a bug in Collect2QBlocks, it's not collecting a circuit of:

     ┌───┐
q_0: ┤ H ├
     ├───┤
q_1: ┤ H ├
     ├───┤
q_2: ┤ H ├
     └───┘

into 3 single gate blocks so when we call consolidate blocks there isn't anything to do. This actually worked before through a side effect/bug of the previous implementation but making it more efficient "fixed" the implementation so it wasn't treating standalone gates as a block by mistake anymore. I'm not sure how best to fix this right now especially because the collect 2q blocks implementation is basically just calling retworkx now.

There's an if statement in retworkx that removes single color blocks, that is why. We can try removing the if, but in the past there were multiple tests in terra failing if we did that.

Yeah, that's to match the behavior of the algorithm that has existed historical in terra for the collect2qblocks pass. That kind of makes sense and this pass is kind of doing split duty. For using collect2qblock, consolidate blocks, unitary synthesis as an optimization routine we don't really want the single node blocks because synthesis won't do a better job than a single node (unless it's an identity matrix) but when using the collect2qblock, consolidate blocks, unitary synthesis chain for basis translation we do need this so that we translate any gates out of the basis set. On the retworkx side I'm not sure it's something we can/want to change the default though because it's potentially a breaking change.

I think the best path there might be to add a flag that enables collecting single color blocks then we can expose that as a config option in collect2qblocks and use it for the synthesis translation variant. That would eliminate the need for a collect1qruns pass. But we won't be able to do that in the 0.19.0 release timeframe because it will require a retworkx release.

ewinston
ewinston previously approved these changes Oct 27, 2021
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty much all sensible to me. I'm starting to get the feeling we could just search the codebase for the list.index method - I've never seen a successful use of it that didn't become accidentally quadratic almost immediately!

Comment on lines 184 to +185
Collect2qBlocks(),
Collect1qRuns(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we call the 1q one "runs" but the 2q one "blocks"? That seems a little inconsistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno it is the terminology that was here when I started hacking on Qiskit. My guess is it's because the 1q case is a series while the 2q can have parallel gates but that's just a guess. It does seem like a weird distinction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can call them all blocks

@mtreinish
Copy link
Member Author

This looks pretty much all sensible to me. I'm starting to get the feeling we could just search the codebase for the list.index method - I've never seen a successful use of it that didn't become accidentally quadratic almost immediately!

Heh, yeah that's not a bad idea. I think that there is probably a better alternative to using list.index every place we encounter it in the code.

jakelishman
jakelishman previously approved these changes Oct 28, 2021
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@@ -977,6 +977,57 @@ def topological_op_nodes(self, key=None):
"""
return (nd for nd in self.topological_nodes(key) if isinstance(nd, DAGOpNode))

def replace_block_with_op(self, node_block, op, wire_pos_map):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this introduce the possibility of creating cycles in the DAG? Let me see if I can draw up an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> import qiskit as qk
>>> qc = qk.QuantumCircuit(2)
>>> qc.cx(0,1)
>>> qc.h(1)
>>> qc.cx(0,1)
>>> dag = qk.converters.circuit_to_dag(qc)
>>> nodes = list(dag.topological_op_nodes())
>>> dag.replace_block_with_op(
    [nodes[0], nodes[-1]],
    qk.circuit.library.CZGate(),
    {bit: idx for (idx, bit) in enumerate(dag.qubits)})
>>> dag.draw()

image

This came up in a similar but long defunct PR (#3503) which might have some relevant tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, let me fix this and I'll probably steal the tests from #3503

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this came up part of #3503, we didn't look into it in great depth, but we didn't find a straightforward (performant) solution to the "don't make a cycle while contracting" problem. That's not to say there isn't one (maybe someone with stronger DAG background can comment), but more that "check+raise" style fix isn't necessarily required. e.g. Warning in the docs that the supplied nodes must form a contiguous block, and that you can inadvertently increase the depth of your circuit may be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this came up part of #3503, we didn't look into it in great depth, but we didn't find a straightforward (performant) solution to the "don't make a cycle while contracting" problem. That's not to say there isn't one (maybe someone with stronger DAG background can comment), but more that "check+raise" style fix isn't necessarily required. e.g. Warning in the docs that the supplied nodes must form a contiguous block, and that you can inadvertently increase the depth of your circuit may be sufficient.

Not proposing a solution but just warning: if there is a cycle it is no longer a DAG, and retworkx will not be able to find a topological sort. Collect2qBlocks and Collect1qRuns will exit with an error if you give an input like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contraction of nodes u, v in a DAG is not valid iff it exists a path of length > 1 connecting u, v. This is a simple check to do, but i agree with @kdk's last point: here we can skip it and just leave a comment in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looking at this locally doing the checking can hurt performance. I tested both the path length check and just adding is_directed_acyclic_graph() after the operations. Adding a cycle check at the end makes this ~10x slower in my testing while doing the distance approach is ~100x slower (there might be room for improvement here because I just called dijkstra's in for loop). The tradeoff is that the distance approach is much cleaner code as it's an upfront check on the inputs.

After all this I agree with @georgios-ts and @kdk and think that documenting this is probably enough. I'm thinking for this case though making the check an opt-out documenting the tradeoff between performance and validation and then for consolidate blocks we explicitly opt-out to avoid the overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in: aea6545

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A plain dijkstra in the original DAG would not actually work here because if we contract a sequence of nodes u_i we should not follow edges with both endpoints in the sequence while searching for a path. As an improved solution, I can't help but advertise the pending BFS/DFS PR's in retworkx that might come in handy here and avoid the need to modify the original DAG. In any case, let's stick with is_directed_acyclic_graph() check for now and we can revisit this.

This commit adds opt-out cycle validation to the replace_block_with_op
method. By default the method will check the input node_block if
contracting the dag over those nodes will introduce a cycle. At the same
time a new kwarg, cycle_check, is added to the replace_block_with_op
method which can be set to False to disable this check (as it ends up
being quite expensive in runtime). The consolidate blocks pass is then
updated to set this new kwarg to false to avoid the overhead as the
check isn't necessary there as the block collection won't introduce a
cycle when the block is consolidated.
Copy link
Member

@kdk kdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates here, one comment but this LGTM otherwise.

qiskit/dagcircuit/dagcircuit.py Show resolved Hide resolved
Co-authored-by: Kevin Krsulich <kevin@krsulich.net>
@mtreinish mtreinish requested a review from kdk November 1, 2021 19:52
Copy link
Member

@kdk kdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates.

@kdk kdk added this to the 0.19 milestone Nov 1, 2021
@mergify mergify bot merged commit 4cf3491 into Qiskit:main Nov 1, 2021
@mtreinish mtreinish deleted the consolidate-perf branch November 1, 2021 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConsolidateBlocks scales superlinearly
7 participants