Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sabre algorithm improvements in transpilation at level 3 not in effect #9090

Closed
johannesgreiner opened this issue Nov 7, 2022 · 13 comments · Fixed by #9116
Closed

sabre algorithm improvements in transpilation at level 3 not in effect #9090

johannesgreiner opened this issue Nov 7, 2022 · 13 comments · Fixed by #9116
Assignees

Comments

@johannesgreiner
Copy link
Contributor

Environment

  • Qiskit Terra version: 0.22.2 (vs. 0.21.1)
  • Python version: 3.8.2
  • Operating system: Windows

What is happening?

transpilation at optimization_level=3 does not (significantly) perform better in terra 0.22.2 than 0.21.1 despite improvements in sabre algorithm.

How can we reproduce the issue?

both in terra 0.22.2 and 0.21.1 run:

qc.x(5)
qc.h(range(6))
qc.cx(range(5),5)
qc.h(range(5))
qc.measure(range(5), range(5))
qc.draw('mpl')

# Transpile the circuit 'qc' 1000 times
trans_circ_list = transpile([qc]*1000, backend, optimization_level=3)

# get the number of cnot gates
cx_counts = np.asarray([circ.count_ops().get('cx', 0) for circ in trans_circ_list])

# create violin plot
import seaborn as sns
ax = sns.violinplot(data=[cx_counts])

What should happen?

Acc to level 3 preset , there should be 20 swap_trials performed for each iteration of transpile. So, the performance should be improved by these 20 iterations. Instead, there is no visible difference, but doing 20 iterations of transpile manually in qiskit 0.21.1 performs much better (will take about 60mins on a 4core-CPU):

global_circ_list=[]

for i in range(1000):
    # Transpile the circuit 'qc' 20 times
    trans_circ_list_loop = transpile([qc]*20, backend, optimization_level=3)

    # get the number of cnot gates
    cx_counts_loop = np.asarray([circ.count_ops().get('cx', 0) for circ in trans_circ_list_loop])
    #print('Number of cnots:', cx_counts_loop)

    # select the circuit with the lowest number
    best_idx_loop = np.argmin(cx_counts_loop)
    best_trans_qc_loop = trans_circ_list_loop[best_idx_loop]
    global_circ_list.append(best_trans_qc_loop)

cx_counts_global = np.asarray([circ.count_ops().get('cx', 0) for circ in global_circ_list]) 

combining all in the plot shows that manually pre-transpiling in the older qiskit version performs much better

pretransp violin plot qiskit cnots comparison

Any suggestions?

there might be a minimization step missing, or a mixup of max_iterations and swap_trials in
https://github.com/Qiskit/qiskit-terra/blob/90b158c7e02432db957762e58c4c2ed75d89a1ca/qiskit/transpiler/preset_passmanagers/level3.py#L140

@johannesgreiner johannesgreiner added the bug Something isn't working label Nov 7, 2022
@jakelishman
Copy link
Member

This isn't a fair comparison, which is likely where the difference is coming in. swap_trials modifies how many different attempts SabreSwap makes each time it's called, but when you rerun transpile, you get different randomisation for the initial layout chosen in SabreLayout (which isn't subject to repeats), before SabreSwap even runs for the first time. The final output of the transpilation is more susceptible to changes in that layout, but right now we're not doing multiple attempts of that component (only the internal routing done within SabreLayout).

@jakelishman
Copy link
Member

I would expect to see a similar improvement for your circuit if you tried the same repeated transpile on 0.22 as you did on 0.21, for the same reason.

@nonhermitian
Copy link
Contributor

nonhermitian commented Nov 7, 2022

I think the point is that the parallel swap trials should be picking the lowest swap count circuit, when it is clear that it is currently not. The goal was to eliminate (to 1st order) the need for users to have to do this themselves now that the swap mapper is much faster

@jakelishman
Copy link
Member

We're not finding a lower swap count and then throwing it away. The trick is that a full transpile runs the entire layout pass with new seeds, whereas we're just reseeding the internal routing passes used within the layout pass. It was much easier to get that to run in parallel than repeating the entire layout pass lots of times, because it happens in Rust space, not Python space.

We are looking to revise the defaults, and re-running the whole layout is something we're interested in, for sure. I'm just saying that this is a bug report suggesting there's an actual mistake in SabreLayout, and I don't think that's the case - what's happened here was by design.

@nonhermitian
Copy link
Contributor

Ok, then it is a bit of a misunderstanding then. We will just go back to telling people to transpile multiple times again for best results. I think this can probably be closed since this is currently expected behavior.

@jakelishman
Copy link
Member

We can leave it open to track - this is something we want to improve on.

@johannesgreiner
Copy link
Contributor Author

Thanks for clarifying. More of a feature request that you already have in mind then

@mtreinish mtreinish self-assigned this Nov 8, 2022
@mtreinish mtreinish added enhancement and removed bug Something isn't working labels Nov 8, 2022
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Nov 9, 2022
This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388,
and Qiskit#8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements Qiskit#9090
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Nov 10, 2022
This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388,
and Qiskit#8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements Qiskit#9090
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Nov 10, 2022
This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388,
and Qiskit#8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements Qiskit#9090
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Nov 10, 2022
This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388,
and Qiskit#8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements Qiskit#9090
@mtreinish
Copy link
Member

mtreinish commented Nov 11, 2022

I pushed up #9116 to do multiple seed trials for the combination of layout and routing here. I still need to tune the PR and characterize the performance (both in quality and runtime). But if you'd like to give that a try and see how it works for your use case that would be useful feedback while it's still under review/being developed.

@mtreinish
Copy link
Member

mtreinish commented Nov 11, 2022

Playing a bit locally with the bv example you were using, for the bernstein vazirani circuit the output quality is definitely primarily a function of the layout. I did a nested sweep to 250 trials of both layout and routing trials with #9116 applied. The script is below (which will only work if you do pip install git+https://github.com/mtreinish/qiskit-core@sabre-layout-rust and have a rust compiler available).

import csv
import statistics
import time

import numpy as np
import retworkx as rx

from qiskit import QuantumCircuit
from qiskit.converters import circuit_to_dag, dag_to_circuit
from qiskit.transpiler import CouplingMap
from qiskit.circuit.library import QuantumVolume, QFT
from qiskit.transpiler.passes import SabreLayout, Unroll3qOrMore
from qiskit.providers.fake_provider import FakeMumbaiV2
from qiskit.compiler import transpile


def bench_bv():
    rng = np.random.default_rng(42_11_11_2022)
    with open("bv.csv", "w", newline="") as csvfile:
        times_writer = csv.writer(csvfile)
        times_writer.writerow(["swap_trials", "layout_trials", "run_time", "swap_count", "depth"])
        basis_gates = ["cx", "x", "sx", "rz", "reset", "delay"]
        backend = FakeMumbaiV2()
        cmap = backend.coupling_map

        qc = QuantumCircuit(6, 5)
        qc.x(5)
        qc.h(range(6))
        qc.cx(range(5),5)
        qc.h(range(5))
        qc.measure(range(5), range(5))
        for seed in rng.integers(0, 4294967295, size=1, dtype=int):
            for layout_trials in sorted(set(np.linspace(1, 250, dtype=int))):
                for swap_trials in sorted(set(np.linspace(1, 250, dtype=int))):
                    print(f"Layout trials: {layout_trials}, Swap Trials: {swap_trials}")
                    super_pass = SabreLayout(coupling_map=cmap, seed=seed, swap_trials=swap_trials, layout_trials=layout_trials)
                    dag = circuit_to_dag(qc)
                    start = time.perf_counter()
                    res = super_pass.run(dag)
                    stop = time.perf_counter()
                    run_time = stop - start
                    swap_count = res.count_ops().get('swap', 0)
                    out_depth = res.depth()
                    times_writer.writerow([swap_trials, layout_trials, run_time, swap_count, out_depth])
                    print(f"Run time: {run_time}, non_local gates: {swap_count}, depth: {out_depth}")

if __name__ == "__main__":
    bench_bv()

which yielded this result:

bv.csv

The swap count and depth decrease only as we increase the number of layout trials. But I think this is something that doesn't hold true for all circuits. Like I know with qft we can get improved output quality when running with just higher swap trials

@nonhermitian
Copy link
Contributor

This is interesting. Let me play with the notebooks that I have looking at some of the other circuits.

@nonhermitian
Copy link
Contributor

One thing of interest is 5Q tests on Quito (Fake version). It seems that I consistently get a larger cx gate count then a year ago where the max was ~85 or so: https://quantum-enablement.org/posts/2021/2021-10-31-best_swap_mapper_qiskit.html#efficient-su2-full

Now I consistently get >100 (on the branch above) using:

import numpy as np
from qiskit import QuantumCircuit
from qiskit.providers.fake_provider import FakeMumbaiV2, FakeQuitoV2
from qiskit.compiler import transpile

from qiskit.circuit.library import QuantumVolume, EfficientSU2, TwoLocal, QFT

backend = FakeQuitoV2()
qc = EfficientSU2(5, entanglement='full')
trans_circs = transpile([qc]*20, backend, optimization_level=3)
[circ.count_ops().get('cx',0) for circ in trans_circs]

E.g.

[125,
 112,
 112,
 112,
 112,
 117,
 117,
 117,
 118,
 125,
 129,
 112,
 117,
 117,
 108,
 129,
 121,
 112,
 112,
 125]

I need to decouple the notebook from Tket before running everything, but what is given here is the first thing I tried

@mtreinish
Copy link
Member

mtreinish commented Nov 11, 2022

Hmm, yeah I was seeing bad performance with that branch in other benchmarks too: #9116 (comment) I'm still not super confident with it which is why it's [WIP] still, I probably broke something or am doing something else wrong in the PR.

At least running the EfficientSU2 example on main and 0.22.2 returns 68 for all the circuits.

@mtreinish
Copy link
Member

I think that I fixed the issues in #9116 that was causing poor results before. The output of running the EfficientSU2 compilation with the PR applied is now [72, 68, 68, 68, 68, 72, 68, 68, 68, 68, 72, 68, 68, 72, 68, 68, 72, 72, 68, 68] (the behavior with the BV circuits remains unchanged).

@mtreinish mtreinish linked a pull request Nov 17, 2022 that will close this issue
4 tasks
@mergify mergify bot closed this as completed in #9116 Dec 8, 2022
mergify bot added a commit that referenced this issue Dec 8, 2022
* Oxidize SabreLayout pass

This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in #7977, #8388,
and #8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in #8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements #9090

* Use deepcopy for coupling map copy

Previously this PR was using copy() to copy the coupling map before we
mutated it to be symmetric (a requirement for the sabre algorithm).
However, this modification of the object was leaking out causing test
failures. This commit switches it to a deepcopy to ensure there are no
shared references (and a comment added to explain it's needed).

* Fix failing unitary synthesis tests

This PR branch modifies the default behavior of the SabreLayout pass so
it is now a transformation pass that computes a layout, applies it, and
then performs routing. This means when using sabre layout in a custom
pass manager we no longer need to embed a layout after computing the
layout. The failing unitary synthesis tests were using a custom pass
manager and trying to apply the layout again after SabreLayout already
did. This commit just removes this now unecessary steps from the test
code.

* Add release note

* Run BarrierBeforeMeasurement before new SabreLayout

Now that the routing stage is integrated into the SabreLayout pass we
should be running the BarrierBeforeMeasurement pass prior to layout in
the preset pass managers instead of before routing. The goal of the pass
is to prevent the routing algorithms for accidentally reusing a qubit
after a final measurement which would be invalid by inserting a barrier
before the measurements to ensure all qubits are swap mapped prior to
adding the measurements during routing. While this might not strictly be
necessary (it didn't affect any test output) it feels like best practice
to ensure we're doing this prior to potentially routing to prevent
issues.

* Improve docstrings

* Set a fixed number of layout trials in preset pass managers

For reproducible results with a fixed seed this commit sets a fixed
number of layout_trials for the SabreLayout pass in the preset pass
managers. If we did not set a fixed value than the output of the
transpiler with a fixed seed will vary based on the number of
physical cores that is running the compilation. To start
optimization levels 0 and 1 use 5, level 2 uses 10, and level
3 uses 20 which matches the swap_trials argument we used. This is just a
starting point, we can adjust these values later if needed.

* Update tests for layout changes

This commit updates the tests which are checking exact layouts with a
fixed seed when running SabreLayout. The changes to SabreLayout breaks
exact seed reproducibility from the earlier version of the pass. So we
need to update these tests for their new layout assignment from the
improved pass. One exception is a test which was trying to assert that
transpile() preserves a swap if it's in the basis set. However, the new
layout and routing output from SabreLayout for that test was resulting
in all the swaps getting optimized away at optimization level 3
(resulting in 13 cx gates instead of ~4 cx gates and 5 swaps before,
which would be more efficient on real hardware). So the test was removed
and only run at lower optimziation levels.

* Set a fixed number of layout trials in SabreLayout tests

The dedicated tests for SabreLayout were not running a fixed number of
trials. This was causing a different layout to be returned in tests when
run across multiple systems as the number of trials defaults to the
number of physical CPUs. This commit fixes the trial count to the number
of cores on the local system where the layout was updated. This should
fix the non-determinism in the tests causing failures in CI and on
different local systems.

* Run SabreSwap in parallel if only a single layout trial

If there is only a single layout trial being run we don't have to worry
about trying to do too much work in parallel at once by parallelizing
the inner sabre swap execution. This commit updates the threading logic
to enable running the inner sabre swap trials in parallel if there is
only a single layout trial.

* Remove duplicated SabreDAG creation

* Correctly apply selected layout on dag nodes

This commit corrects a bug in the PR branch that was caused by applying
the selected initial layout in a trial to the swapped order node list.
This was causing unexpected results when applying the circuit because
the intent was to apply it only to the original input not the reversed
input.

* Remove unnecessary clone from serial layout trials

In the case we're evaluating the layout trials serially instead of in a
parallel iterator we don't need to clone the dag nodes list. This is
because nothing will be modifying it in parallel, so we don't need a
thread local copy. Each call to layout_trial() will keep the dag nodes
vector intact (see previous commit for fixing this) so it can just be
passed by reference if there are no parallel threads involved.

* Fix seed setup when no user seed specified

This commit fixes an issue prevent seed randomization when no seed is
specified. On subsequent uses of a pass SabreLayout would not randomize
the seed between runs because it was setting the seed to instance state.
This commit fixes this issue by relying on initializing the RNG from
entropy each time run() is called if no user specified seed is provided.

* Start from trivial layout for routing stage

This commit fixes the routing run to run from a trivial layout instead
of the initial layout. By the time we do final routing for a trial we've
already applied the selected initial layout to the SabreDAG. So the
correct layout to use for running final swap mapping is a trivial layout
where logical bit 0 is at physical bit 0. Using initial layout twice
means we end up mapping more than is needed resulting in incorrect
results.

* Revert "Correctly apply selected layout on dag nodes"

This change was incorrect, the output was already in the correct order
and this was causing the behavior it strived to fix. This commit reverts
the addition of the extra mem::swap() call to fix things.

This reverts commit d98ef6c.

* Deduplicate NLayout trivial layout creation

This commit deduplicates the trivial layout generation for the NLayout
class. Previously there were a few places both in rust and python that
sabre layout was manually generating a trivial NLayout object. THis
commit adds a static method to the NLayout class that allows both Python
and Rust to easily create a new trivial NLayout object instead of
manually creating the object.

* Fix fixed layout tests after updates

Since more recent commits fixed a few bugs in the behavior of the
SabreLayout pass, the previously updated fixed layout tests were no
longer correct. This commit updates the tests which were now failing
because the layout changed again after fixing bugs in the new pass code.

* Try nesting parallelism in the sabres

Looking at profiles for running the new SabreLayout pass, as expected
the runtime of the rust SabreSwap routines is dominating. This is
because we've basically serialized the sabre swap routines and are
running multiple seed trials. As an experiment this commit sets the
inner SabreSwap routines to run in parallel too. Since the rayon
algorithm uses a work stealing algorithm this hopefully shouldn't cause
too much extra overhead, especially because the layout trials are quite
fast. This ideally means we're just scheduling each sabre swap trial in
a big parallel work queue and rayon does the rest of the magic to figure
out how to execute things. Initial testing is showing an improvement for
large circuits and a more modest improvement for more modest circuits.

* Add skip_routing argument to preserve custom user provided routing

This commit adds a new argument, skip_routing, to the SabreLayout
constructor. The intent of this new option is to enable mixing custom
routing_method user arguments with SabreLayout in it's new accelerated
mode of operation. In the earlier commits no matter what users specified
the preset pass manager construction would use sabreswap for routing as
it was run internally as part of layout. This meant doing something
like:

transpile(qc, backend, routing_method='stochastic')

would really run SabreSwap which is clearly not the user intent. To
provide the layout benefits with multiple seed trials this new argument
allows disabling the application of the routing found. This comes with a
runtime penalty because effectively we end up running routing twice and
only using one of the results. But for custom user provided methods or
plugins this seems like a reasonable tradeoff.

* Fix typo in docstring

* Update random seed usage in rust code

In #9132 we updated the random seed parameters in the rust code for
sabre swap to make the seed optional and default to initializing from
entropy if it's not specified. This commit updates the usage to account
for this change on main.

* s/retworkx/rustworkx/g

* Add alternate constructor for NLayout from a logic_to_phys vec

This commit adds a new constructor method to the NLayout class that
builds an NLayout object from just a logic_to_phys Vec. This constructor
can be accessed from either rust or python (although it's not as
efficient from Python). This is used to simplify some of the SabreLayout
rust code that was doing this inline manually.

* Move layout embedding into a method

This commit moves the code the optimized SabreLayout pass was using to
embed the found layout from the Rust code into a method. This will make
it easier to refactor later if a more efficient pass manager path is
added.

* Simplify pass logic and update comments

This commit removes an unnecessary else branch in the SabreLayout.run()
code to make it slightly easier to read. At the same time some comments
are updated to better explain the logic of the code.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Cryoris pushed a commit to Cryoris/qiskit-terra that referenced this issue Jan 12, 2023
* Oxidize SabreLayout pass

This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388,
and Qiskit#8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements Qiskit#9090

* Use deepcopy for coupling map copy

Previously this PR was using copy() to copy the coupling map before we
mutated it to be symmetric (a requirement for the sabre algorithm).
However, this modification of the object was leaking out causing test
failures. This commit switches it to a deepcopy to ensure there are no
shared references (and a comment added to explain it's needed).

* Fix failing unitary synthesis tests

This PR branch modifies the default behavior of the SabreLayout pass so
it is now a transformation pass that computes a layout, applies it, and
then performs routing. This means when using sabre layout in a custom
pass manager we no longer need to embed a layout after computing the
layout. The failing unitary synthesis tests were using a custom pass
manager and trying to apply the layout again after SabreLayout already
did. This commit just removes this now unecessary steps from the test
code.

* Add release note

* Run BarrierBeforeMeasurement before new SabreLayout

Now that the routing stage is integrated into the SabreLayout pass we
should be running the BarrierBeforeMeasurement pass prior to layout in
the preset pass managers instead of before routing. The goal of the pass
is to prevent the routing algorithms for accidentally reusing a qubit
after a final measurement which would be invalid by inserting a barrier
before the measurements to ensure all qubits are swap mapped prior to
adding the measurements during routing. While this might not strictly be
necessary (it didn't affect any test output) it feels like best practice
to ensure we're doing this prior to potentially routing to prevent
issues.

* Improve docstrings

* Set a fixed number of layout trials in preset pass managers

For reproducible results with a fixed seed this commit sets a fixed
number of layout_trials for the SabreLayout pass in the preset pass
managers. If we did not set a fixed value than the output of the
transpiler with a fixed seed will vary based on the number of
physical cores that is running the compilation. To start
optimization levels 0 and 1 use 5, level 2 uses 10, and level
3 uses 20 which matches the swap_trials argument we used. This is just a
starting point, we can adjust these values later if needed.

* Update tests for layout changes

This commit updates the tests which are checking exact layouts with a
fixed seed when running SabreLayout. The changes to SabreLayout breaks
exact seed reproducibility from the earlier version of the pass. So we
need to update these tests for their new layout assignment from the
improved pass. One exception is a test which was trying to assert that
transpile() preserves a swap if it's in the basis set. However, the new
layout and routing output from SabreLayout for that test was resulting
in all the swaps getting optimized away at optimization level 3
(resulting in 13 cx gates instead of ~4 cx gates and 5 swaps before,
which would be more efficient on real hardware). So the test was removed
and only run at lower optimziation levels.

* Set a fixed number of layout trials in SabreLayout tests

The dedicated tests for SabreLayout were not running a fixed number of
trials. This was causing a different layout to be returned in tests when
run across multiple systems as the number of trials defaults to the
number of physical CPUs. This commit fixes the trial count to the number
of cores on the local system where the layout was updated. This should
fix the non-determinism in the tests causing failures in CI and on
different local systems.

* Run SabreSwap in parallel if only a single layout trial

If there is only a single layout trial being run we don't have to worry
about trying to do too much work in parallel at once by parallelizing
the inner sabre swap execution. This commit updates the threading logic
to enable running the inner sabre swap trials in parallel if there is
only a single layout trial.

* Remove duplicated SabreDAG creation

* Correctly apply selected layout on dag nodes

This commit corrects a bug in the PR branch that was caused by applying
the selected initial layout in a trial to the swapped order node list.
This was causing unexpected results when applying the circuit because
the intent was to apply it only to the original input not the reversed
input.

* Remove unnecessary clone from serial layout trials

In the case we're evaluating the layout trials serially instead of in a
parallel iterator we don't need to clone the dag nodes list. This is
because nothing will be modifying it in parallel, so we don't need a
thread local copy. Each call to layout_trial() will keep the dag nodes
vector intact (see previous commit for fixing this) so it can just be
passed by reference if there are no parallel threads involved.

* Fix seed setup when no user seed specified

This commit fixes an issue prevent seed randomization when no seed is
specified. On subsequent uses of a pass SabreLayout would not randomize
the seed between runs because it was setting the seed to instance state.
This commit fixes this issue by relying on initializing the RNG from
entropy each time run() is called if no user specified seed is provided.

* Start from trivial layout for routing stage

This commit fixes the routing run to run from a trivial layout instead
of the initial layout. By the time we do final routing for a trial we've
already applied the selected initial layout to the SabreDAG. So the
correct layout to use for running final swap mapping is a trivial layout
where logical bit 0 is at physical bit 0. Using initial layout twice
means we end up mapping more than is needed resulting in incorrect
results.

* Revert "Correctly apply selected layout on dag nodes"

This change was incorrect, the output was already in the correct order
and this was causing the behavior it strived to fix. This commit reverts
the addition of the extra mem::swap() call to fix things.

This reverts commit d98ef6c.

* Deduplicate NLayout trivial layout creation

This commit deduplicates the trivial layout generation for the NLayout
class. Previously there were a few places both in rust and python that
sabre layout was manually generating a trivial NLayout object. THis
commit adds a static method to the NLayout class that allows both Python
and Rust to easily create a new trivial NLayout object instead of
manually creating the object.

* Fix fixed layout tests after updates

Since more recent commits fixed a few bugs in the behavior of the
SabreLayout pass, the previously updated fixed layout tests were no
longer correct. This commit updates the tests which were now failing
because the layout changed again after fixing bugs in the new pass code.

* Try nesting parallelism in the sabres

Looking at profiles for running the new SabreLayout pass, as expected
the runtime of the rust SabreSwap routines is dominating. This is
because we've basically serialized the sabre swap routines and are
running multiple seed trials. As an experiment this commit sets the
inner SabreSwap routines to run in parallel too. Since the rayon
algorithm uses a work stealing algorithm this hopefully shouldn't cause
too much extra overhead, especially because the layout trials are quite
fast. This ideally means we're just scheduling each sabre swap trial in
a big parallel work queue and rayon does the rest of the magic to figure
out how to execute things. Initial testing is showing an improvement for
large circuits and a more modest improvement for more modest circuits.

* Add skip_routing argument to preserve custom user provided routing

This commit adds a new argument, skip_routing, to the SabreLayout
constructor. The intent of this new option is to enable mixing custom
routing_method user arguments with SabreLayout in it's new accelerated
mode of operation. In the earlier commits no matter what users specified
the preset pass manager construction would use sabreswap for routing as
it was run internally as part of layout. This meant doing something
like:

transpile(qc, backend, routing_method='stochastic')

would really run SabreSwap which is clearly not the user intent. To
provide the layout benefits with multiple seed trials this new argument
allows disabling the application of the routing found. This comes with a
runtime penalty because effectively we end up running routing twice and
only using one of the results. But for custom user provided methods or
plugins this seems like a reasonable tradeoff.

* Fix typo in docstring

* Update random seed usage in rust code

In Qiskit#9132 we updated the random seed parameters in the rust code for
sabre swap to make the seed optional and default to initializing from
entropy if it's not specified. This commit updates the usage to account
for this change on main.

* s/retworkx/rustworkx/g

* Add alternate constructor for NLayout from a logic_to_phys vec

This commit adds a new constructor method to the NLayout class that
builds an NLayout object from just a logic_to_phys Vec. This constructor
can be accessed from either rust or python (although it's not as
efficient from Python). This is used to simplify some of the SabreLayout
rust code that was doing this inline manually.

* Move layout embedding into a method

This commit moves the code the optimized SabreLayout pass was using to
embed the found layout from the Rust code into a method. This will make
it easier to refactor later if a more efficient pass manager path is
added.

* Simplify pass logic and update comments

This commit removes an unnecessary else branch in the SabreLayout.run()
code to make it slightly easier to read. At the same time some comments
are updated to better explain the logic of the code.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants