Improve performance of VF2 scoring and add support for scoring passes #9026

mtreinish · 2022-10-28T18:58:46Z

Summary

This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottle necked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into an ErrorMap class which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains a ErrorMap used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic.

Details and comments

TODO:

Fix test failures. Fails with faulty qubits (I assume the backend properties mapping is broken and only worked by chance before this) and fake melbourne (which had a weird broken qubit and I bet the stored payload is broken)
Benchmark and potentially tune performance

This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottlenecked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into a 2D numpy array which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains a 2d array used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic.

qiskit-bot · 2022-10-28T18:58:49Z

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@Qiskit/terra-core
@kevinhartman
@mtreinish

This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of Qiskit#9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR.

For BackendV1 based backends it's possible for the BackendProperties object for that beackend to get out of sync with the number fo qubits actually available in the system. In such cases looking up the noise characteristics can potentially fail when building the error map because the reported number of qubits is less than the qubits there are properties for. This wasn't an issue in the previous error map data structure because it was a dictionary and it would just add the error rate for the extra qubits even though it wasn't valid. However, now that we're using a numpy array with a fixed size this isn't the case anymore and an error would be raised in these cases. To workaround this issue this commit skips any qubits outside the allowed range in the BackendProperties when building the error map to account for this potential discrepency. The extra properties couldn't be used anyway since they're not valid device qubits in such cases.

coveralls · 2022-10-31T18:15:55Z

Pull Request Test Coverage Report for Build 3500758612

138 of 164 (84.15%) changed or added relevant lines in 7 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.04%) to 84.575%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/transpiler/passes/layout/vf2_utils.py	39	41	95.12%
src/vf2_layout.rs	54	57	94.74%
src/error_map.rs	18	39	46.15%

Totals
Change from base Build 3500114179:	-0.04%
Covered Lines:	62629
Relevant Lines:	74051

💛 - Coveralls

This commit updates the vf2 layout scoring to work with a dictionary object instead of a Layout object. Previously we were creating a Layout object on each mapping found and passing that to scoring. However, this was unecessary overhead as the Layout object is slow to create and interact with. Since we only need a Layout object if we're potentially returning the layout as the best result we can avoid this extra overhead.

This commit removes the lookup for the QISKIT_IN_PARALLEL env variable from the rust code for vf2 scoring. THis was adding unecessary overhead to a frequently called function when it only needs to be computed once. This commit moves the lookup to python outside the for loop and just passes the evaluated boolean to the rust function instead.

This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of #9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of #9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> (cherry picked from commit 234816c)

This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of #9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> (cherry picked from commit 234816c) Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

jakelishman

Moving scoring to Rust generally seems sensible to me. Most of the comments below are very minor.

My two main things are:

I'm concerned we're overloading NaN in the gate-error matrix with two incompatible meanings.
I think that even for 1000q systems, the 2D matrix is probably going to be fine, but there's a possibility we might end up with better cache locality in the scoring for large systems if we considered some sparser structure, since we generally expect that most real-world systems will have limited connectivity, so this 2D matrix will in practice be very sparse. This certainly doesn't need investigating for this PR, just wondering if you'd thought anything about it?

qiskit/transpiler/passes/layout/vf2_layout.py

qiskit/transpiler/passes/layout/vf2_post_layout.py

qiskit/transpiler/passes/layout/vf2_utils.py

releasenotes/notes/vf2_custom_score_analysis-abb191d56c0c1578.yaml

jakelishman · 2022-11-01T13:37:33Z

src/vf2_layout.rs

+    edge_list: IndexMap<[usize; 2], i32>,
+    error_matrix: PyReadonlyArray2<f64>,
+    layout: &NLayout,
+    strict_direction: bool,


This flag seems unnecessary in Rust with things now represented as an error_matrix? I'd have thought that it's just built in during the construction of the matrix.

It's not necessarily reflected in the error matrix, it really depends on the backend properties since we could end up with an error rate defined directionally in the backend and it's hard to know until after we finish building the matrix so we don't build the error matrix bidirectional if strict_direction=False. It becomes more a question of scoring behavior than representing it. If strict_direction = False the and err_mat[[0, 1]] is NaN it will try err_mat[[1, 0]]

Yeah, that's fair enough then. In all the situations I could think of, the backend would just have constructed the directional error matrix itself, but if there's a chance those are decoupled, then it's right to include the swap.

That said, it feels a bit odd in the scoring that we don't take into account both sides of the link in other cases. If it's not strict directionality, but the two directions have different average errors, it feels weird that we don't take that into account somehow? In my mental model, that situation shouldn't be possible, but if the error matrix and strict directionality aren't coupled at the level of the backend, it starts to feel possible and unclear in how it should be handled.

It's a tradeoff to find a path vs not finding a path. There are two modes of operation right now with the vf2 mapping, either we work with directed graphs and respect the direction of gates on the backend or we treat every edge as weak and work with undirected edges and then rely on GateDirection later to correct things. In the former case if the edge is defined bidirectionally with different error rates rustworkx will return a different mapping with either direction and score them differently. But for the latter case we can't really make an assertion about the order of the qubits for the qargs because it's not explicitly set.

That being said I think you're right for the undirected case maybe we should be looking at the other direction in scoring if both directions have defined error rates. We'll have to think of the best way to do this. (for this I just copied the scoring algorithm we were using before anyway)

src/vf2_layout.rs

Co-authored-by: Jake Lishman <jake@binhbar.com>

This commit deduplicates a bunch of the rust side code for scoring into 2 closures and replaces all the reduce() calls with product() to accomplish the same thing.

Co-authored-by: Jake Lishman <jake@binhbar.com>

In order to support large (> 1000) qubit systems efficiently this commit pivots away from using a 2d numpy array to represent the average error rates for a target. For 1000q this error matrix would take 8 MB of memory but for 10k qubit it would take 800 MB. Considering by their nature these error matricies should be fairly sparse as connectivity in typical QPUs is sparse. This was just wasted memory as we'll end up with a lot of NaN values in the array. Instead this commit adds a new Rust struct/Python class ErrorMap which just wraps a HashMap and maps a 2 element int array to a float. This way we only store entries where there is defined connectivity and are more memory efficient.

mtreinish · 2022-11-02T19:54:10Z

I've updated the PR to use a custom hash map based class instead of a 2d numpy array to represent the average error map. This should both be much more memory efficient and also make the usage a bit clearer for people to interact with.

This commit moves the NLayout rust class out of the stochastic swap python module into a new standalone nlayout module in qiskit._accelerate. The NLayout class was originally added with the stochastic swap rust code, but since then it's started being used by other rust code including SabreSwap (and soon to be VF2Layout and VF2PostLayout scoring in Qiskit#9026).

This commit moves the NLayout rust class out of the stochastic swap python module into a new standalone nlayout module in qiskit._accelerate. The NLayout class was originally added with the stochastic swap rust code, but since then it's started being used by other rust code including SabreSwap (and soon to be VF2Layout and VF2PostLayout scoring in #9026).

mtreinish · 2022-11-16T18:20:38Z

To show the performance improvements I threw together a small test script:

import statistics
import time

from qiskit.transpiler.passes.layout import VF2Layout
from qiskit.circuit import QuantumCircuit
from qiskit.providers.fake_provider import FakeMumbaiV2
from qiskit.converters import circuit_to_dag

qc = QuantumCircuit(7)

qc.h(0)
qc.cz(0, 1)
qc.cz(1, 2)
qc.cz(2, 3)
qc.measure_all()
dag = circuit_to_dag(qc)
backend = FakeMumbaiV2()

times = []
vf2_pass = VF2Layout(target=backend.target, max_trials=-1)
for i in range(5):
    print(f"Run {i}")
    start = time.perf_counter()
    vf2_pass.run(dag)
    stop = time.perf_counter()
    times.append(stop - start)
    print(stop - start)
print(statistics.geometric_mean(times))

This script is a worst case from a scoring perspective, it's mapping a line with 4 nodes and 3 free nodes onto the coupling graph. This means there are a lot of possible permutations for valid isomorphic mappings but each is trivial for rustworkx to calculate. It also turns off any limits in the pass so it will fully iterate through all available mappings.

Running this script with this PR applied returned:

6.825820831970144

Then running it on main:

14.780817176541099

This obviously is a best case improvement. In more realistic testing with limits set I was seeing a ~10% improvement over main with this PR applied.

kevinhartman

LGTM.

Should we add some test to make sure the avg_error_map property set setting is properly loaded/used when present?

qiskit/transpiler/passes/layout/vf2_layout.py

qiskit/transpiler/passes/layout/vf2_utils.py

releasenotes/notes/vf2_custom_score_analysis-abb191d56c0c1578.yaml

src/error_map.rs

Co-authored-by: Kevin Hartman <kevin@hart.mn>

mtreinish · 2022-11-18T22:45:04Z

LGTM.

Should we add some test to make sure the avg_error_map property set setting is properly loaded/used when present?

Good call, I added a test with avg_error_map in: 50aab71

kevinhartman

LGTM!

…Qiskit#9026) * Improve performance of VF2 scoring and add support for scoring passes This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottlenecked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into a 2D numpy array which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains a 2d array used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic. * Handle missing qubits from properties Payload For BackendV1 based backends it's possible for the BackendProperties object for that beackend to get out of sync with the number fo qubits actually available in the system. In such cases looking up the noise characteristics can potentially fail when building the error map because the reported number of qubits is less than the qubits there are properties for. This wasn't an issue in the previous error map data structure because it was a dictionary and it would just add the error rate for the extra qubits even though it wasn't valid. However, now that we're using a numpy array with a fixed size this isn't the case anymore and an error would be raised in these cases. To workaround this issue this commit skips any qubits outside the allowed range in the BackendProperties when building the error map to account for this potential discrepency. The extra properties couldn't be used anyway since they're not valid device qubits in such cases. * Limit number of intermediate Layout objects created This commit updates the vf2 layout scoring to work with a dictionary object instead of a Layout object. Previously we were creating a Layout object on each mapping found and passing that to scoring. However, this was unecessary overhead as the Layout object is slow to create and interact with. Since we only need a Layout object if we're potentially returning the layout as the best result we can avoid this extra overhead. * Move environment variable check outside loop This commit removes the lookup for the QISKIT_IN_PARALLEL env variable from the rust code for vf2 scoring. THis was adding unecessary overhead to a frequently called function when it only needs to be computed once. This commit moves the lookup to python outside the for loop and just passes the evaluated boolean to the rust function instead. * Fix rust lint * Apply suggestions from code review Co-authored-by: Jake Lishman <jake@binhbar.com> * Simplify duplicated rust iteration code This commit deduplicates a bunch of the rust side code for scoring into 2 closures and replaces all the reduce() calls with product() to accomplish the same thing. * Update qiskit/transpiler/passes/layout/vf2_layout.py Co-authored-by: Jake Lishman <jake@binhbar.com> * Use np.full() instead of np.empty() and np.fill() * Pivot from 2d numpy array to a custom ErrorMap class In order to support large (> 1000) qubit systems efficiently this commit pivots away from using a 2d numpy array to represent the average error rates for a target. For 1000q this error matrix would take 8 MB of memory but for 10k qubit it would take 800 MB. Considering by their nature these error matricies should be fairly sparse as connectivity in typical QPUs is sparse. This was just wasted memory as we'll end up with a lot of NaN values in the array. Instead this commit adds a new Rust struct/Python class ErrorMap which just wraps a HashMap and maps a 2 element int array to a float. This way we only store entries where there is defined connectivity and are more memory efficient. * Fix lint * Fix import path after rebase * Update release notes * Apply suggestions from code review Co-authored-by: Kevin Hartman <kevin@hart.mn> * Build empty ErrorMap in case of no target or coupling map * Add helper function for layout creation in VF2Layout scoring loop * Add test with custom ErrorMap analysis pass Co-authored-by: Jake Lishman <jake@binhbar.com> Co-authored-by: Kevin Hartman <kevin@hart.mn>

mtreinish added on hold Can not fix yet performance Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository labels Oct 28, 2022

mtreinish requested a review from a team as a code owner October 28, 2022 18:58

mtreinish mentioned this pull request Oct 28, 2022

Fix up docstring for VF2PostLayout #9027

Merged

mtreinish added 2 commits October 31, 2022 13:18

Merge remote-tracking branch 'origin/main' into vf2-rust

499641c

mtreinish changed the title ~~[WIP] Improve performance of VF2 scoring and add support for scoring passes~~ Improve performance of VF2 scoring and add support for scoring passes Oct 31, 2022

mtreinish removed the on hold Can not fix yet label Oct 31, 2022

mtreinish added 3 commits October 31, 2022 15:24

Fix rust lint

77fa673

Merge branch 'main' into vf2-rust

dc51641

jakelishman reviewed Nov 1, 2022

View reviewed changes

mtreinish and others added 8 commits November 1, 2022 10:56

Apply suggestions from code review

808377e

Co-authored-by: Jake Lishman <jake@binhbar.com>

Simplify duplicated rust iteration code

295a829

This commit deduplicates a bunch of the rust side code for scoring into 2 closures and replaces all the reduce() calls with product() to accomplish the same thing.

Merge remote-tracking branch 'origin/main' into vf2-rust

4911a4c

Update qiskit/transpiler/passes/layout/vf2_layout.py

773c337

Co-authored-by: Jake Lishman <jake@binhbar.com>

Merge branch 'main' into vf2-rust

da8d8cd

Use np.full() instead of np.empty() and np.fill()

6f9316a

Merge remote-tracking branch 'origin/main' into vf2-rust

545df73

Fix lint

7571f09

mtreinish mentioned this pull request Nov 2, 2022

Split out NLayout rust class into standalone Python module #9064

Merged

mtreinish added 6 commits November 4, 2022 07:35

Merge remote-tracking branch 'origin/main' into vf2-rust

f8a5554

Fix import path after rebase

f01eaaf

Merge branch 'main' into vf2-rust

4ae3eee

Merge branch 'main' into vf2-rust

dc2fcd8

Merge remote-tracking branch 'origin/main' into vf2-rust

178dd48

Update release notes

23c054e

mtreinish mentioned this pull request Nov 16, 2022

Improve efficiency of vf2 pass search with free nodes #9148

Merged

kevinhartman reviewed Nov 18, 2022

View reviewed changes

mtreinish and others added 5 commits November 18, 2022 17:17

Apply suggestions from code review

a6d38c8

Co-authored-by: Kevin Hartman <kevin@hart.mn>

Merge remote-tracking branch 'origin/main' into vf2-rust

9505910

Build empty ErrorMap in case of no target or coupling map

7c38723

Add helper function for layout creation in VF2Layout scoring loop

50aab71

Add test with custom ErrorMap analysis pass

e89b8e7

mtreinish requested a review from kevinhartman November 18, 2022 22:45

kevinhartman approved these changes Nov 18, 2022

View reviewed changes

kevinhartman added the automerge label Nov 18, 2022

mergify bot merged commit 83c84b0 into Qiskit:main Nov 18, 2022

mtreinish deleted the vf2-rust branch November 19, 2022 00:13

mtreinish mentioned this pull request Nov 21, 2022

Add frequency collision analysis pass #8621

Closed

2 tasks

kevinsung mentioned this pull request Dec 17, 2022

Update test values that were broken by Terra transpilation changes qiskit-community/mapomatic#54

Closed

mtreinish mentioned this pull request Mar 21, 2023

Improve efficiency of VF2 family of layout pass scoring with disjoint circuits #9834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of VF2 scoring and add support for scoring passes #9026

Improve performance of VF2 scoring and add support for scoring passes #9026

mtreinish commented Oct 28, 2022 •

edited

Loading

qiskit-bot commented Oct 28, 2022

coveralls commented Oct 31, 2022 •

edited

Loading

jakelishman left a comment

jakelishman Nov 1, 2022

mtreinish Nov 1, 2022

jakelishman Nov 1, 2022

mtreinish Nov 1, 2022

mtreinish commented Nov 2, 2022

mtreinish commented Nov 16, 2022

kevinhartman left a comment •

edited

Loading

mtreinish commented Nov 18, 2022

kevinhartman left a comment

Improve performance of VF2 scoring and add support for scoring passes #9026

Improve performance of VF2 scoring and add support for scoring passes #9026

Conversation

mtreinish commented Oct 28, 2022 • edited Loading

Summary

Details and comments

qiskit-bot commented Oct 28, 2022

coveralls commented Oct 31, 2022 • edited Loading

Pull Request Test Coverage Report for Build 3500758612

💛 - Coveralls

jakelishman left a comment

Choose a reason for hiding this comment

jakelishman Nov 1, 2022

Choose a reason for hiding this comment

mtreinish Nov 1, 2022

Choose a reason for hiding this comment

jakelishman Nov 1, 2022

Choose a reason for hiding this comment

mtreinish Nov 1, 2022

Choose a reason for hiding this comment

mtreinish commented Nov 2, 2022

mtreinish commented Nov 16, 2022

kevinhartman left a comment • edited Loading

Choose a reason for hiding this comment

mtreinish commented Nov 18, 2022

kevinhartman left a comment

Choose a reason for hiding this comment

mtreinish commented Oct 28, 2022 •

edited

Loading

coveralls commented Oct 31, 2022 •

edited

Loading

kevinhartman left a comment •

edited

Loading