Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup constant factors in LookaheadSwap #8068

Merged
merged 4 commits into from
Jun 21, 2022

Conversation

jakelishman
Copy link
Member

Summary

This picks some of the low-hanging fruit in LookaheadSwap, avoiding
recalculating various properties and entities that are already known,
and making some access patterns more efficient. It does not change the
complexity properties of the algorithm, which will still cause its
runtime to be excessive for large circuits.

Details and comments

This is worth about a 3x speedup in the pass itself, but it doesn't address any of the scaling issues that cause #5198, for example.

Using a circuit based on the one in #5198:

In [1]: from qiskit import QuantumCircuit
   ...: from qiskit.circuit.library import QuantumVolume
   ...: from qiskit.test.mock import FakeManhattan
   ...: from qiskit.converters import circuit_to_dag
   ...: from qiskit.transpiler import CouplingMap
   ...: from qiskit.transpiler.passes import LookaheadSwap
   ...:
   ...: dag = circuit_to_dag(
   ...:     QuantumCircuit(65).compose(QuantumVolume(4, seed=13).decompose(), [32, 33, 34, 35])
   ...: )
   ...: cm = CouplingMap(FakeManhattan().configuration().coupling_map)
   ...: swap = LookaheadSwap(cm, 5, 5)
   ...:
   ...: %timeit swap.run(dag)

gave 6.04(6)s before this commit, and 1.69(1)s after.

This pass isn't our preferred routing method any more, but I was looking at it since the randomised tests are spotting failures in it at the moment. This commit won't address the failures; the meaningful logic is unchanged, this commit just does more caching of intermediate values and uses more efficient paths to calculate things. I don't know the cause of the failures right now, but if I find it, I'll make a new PR.

It's possible we could consider dropping this routing pass in favour of one of the other, faster methods if its scaling problems are inherent.

This picks some of the low-hanging fruit in `LookaheadSwap`, avoiding
recalculating various properties and entities that are already known,
and making some access patterns more efficient.  It does not change the
complexity properties of the algorithm, which will still cause its
runtime to be excessive for large circuits.
@jakelishman jakelishman added performance Changelog: New Feature Include in the "Added" section of the changelog labels May 16, 2022
@jakelishman jakelishman added this to the 0.21 milestone May 16, 2022
@jakelishman jakelishman requested a review from a team as a code owner May 16, 2022 13:26
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@coveralls
Copy link

coveralls commented May 16, 2022

Pull Request Test Coverage Report for Build 2535942293

  • 60 of 61 (98.36%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.0005%) to 84.282%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/routing/lookahead_swap.py 60 61 98.36%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/routing/lookahead_swap.py 2 93.98%
Totals Coverage Status
Change from base Build 2535468241: 0.0005%
Covered Lines: 54909
Relevant Lines: 65149

💛 - Coveralls

@mtreinish mtreinish self-assigned this Jun 14, 2022
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one question inline about the _first_op_node() method before I tag as automerge.

I do think we should port this to rust after #8133 is fixed. This refactor makes it trivial to port as the structure is much closer to how it would be written in rust.

Comment on lines +367 to +371
def _first_op_node(dag):
"""Get the first op node from a DAG."""
# This doesn't use `DAGCircuit.op_nodes` because that function always consumes the entire
# iterator to create a list, whereas we only need the first element.
return next(node for node in dag.nodes() if isinstance(node, DAGOpNode))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the code I don't think this matters but this is the first op node based on insertion order not necessarily topological order was that your intent or did you want the first node from a topological ordering of the dag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a direct port of the existing code - it previously did this exact thing but constructed the complete list just to take the first element. I was mostly trying to ensure the iteration order wasn't affect while just trying to speed it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's what it looked like to me too but I wanted to confirm. The code just read a little weird to me because the "first" node could actually be something in the middle of the dag or at the end as it's dependent on insertion order.

continue
qubits = gate["partition"][0]
if len(qubits) == 2:
out += state.coupling_map.distance(layout_map[qubits[0]], layout_map[qubits[1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really something we should change here, but I typically avoid the distance() method and just access the inner distance matrix directly because besides the function call overhead you also have bounds checking which for something like this you know you're layout isn't going to return bounds outside the matrix. You could probably also use numpy's sum to speed this up (although when I've tried that in the past it wasn't really a noticeable difference).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back, I think I was just doing a direct port of the code here - it previously used coupling_map.distance, just with an iterable unpacking of a generator expression. I unrolled the generator, but left the function call.

Copy link
Member Author

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass is so slow, and (I think) pretty much completely superseded by SabreSwap, so I'm not 100% sure it's worth rewriting it in Rust. I think there's some extant bugs as well - it was originally randomised testing failures that caused me to look at the pass.

It's not entirely a coincidence the new form looks more Rust-like - I've always preferred pure functions that pass "state" objects around rather than making objects that are naturally pure API-implementation collections (transpiler passes) one-time-use by making them stateful of themselves.

continue
qubits = gate["partition"][0]
if len(qubits) == 2:
out += state.coupling_map.distance(layout_map[qubits[0]], layout_map[qubits[1]])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back, I think I was just doing a direct port of the code here - it previously used coupling_map.distance, just with an iterable unpacking of a generator expression. I unrolled the generator, but left the function call.

Comment on lines +367 to +371
def _first_op_node(dag):
"""Get the first op node from a DAG."""
# This doesn't use `DAGCircuit.op_nodes` because that function always consumes the entire
# iterator to create a list, whereas we only need the first element.
return next(node for node in dag.nodes() if isinstance(node, DAGOpNode))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a direct port of the existing code - it previously did this exact thing but constructed the complete list just to take the first element. I was mostly trying to ensure the iteration order wasn't affect while just trying to speed it up.

@mergify mergify bot merged commit 0e3e68d into Qiskit:main Jun 21, 2022
@jakelishman jakelishman deleted the speedup-lookahead-swap branch June 22, 2022 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants