-
Notifications
You must be signed in to change notification settings - Fork 749
Add benchmarks for Sabre on large QFT and QV circuits #1622
Conversation
Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks.
66bf11f
to
11df842
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM, but before I approve I'm going to spin up a run locally, but do you have a rough runtime estimate for these new benchmarks?
On my machine with |
Ah, I still have the output in the scrollback of my terminal (before I fixed the QFT generation):
|
I did a run locally, this adds ~30mins to local runtime (31:20.39 for my local
|
My hope is that Qiskit/qiskit#9012 will (in fairly short order) knock off a good amount of that time again, and make the cost something more like 10-15 minutes on a run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here LGTM I'm fine with merging this as is. It'd be really great if asv let us measure multiple values in a benchmark and also if we could do a combined timed tracking benchmark. I did leave an inline suggestion for reducing the runtime a bit but please feel free to ignore it and just tag this automerge if you prefer.
test/benchmarks/qft.py
Outdated
class LargeQFTMappingBench: | ||
timeout = 600.0 # seconds | ||
|
||
heavy_hex_size = {115: 7, 409: 13, 1081: 21} | ||
params = ([115, 409, 1081], ["lookahead", "decay"]) | ||
param_names = ["n_qubits", "heuristic"] | ||
|
||
def setup(self, n_qubits, _heuristic): | ||
qr = QuantumRegister(n_qubits, name="q") | ||
self.dag = circuit_to_dag(build_model_circuit(qr)) | ||
self.coupling = CouplingMap.from_heavy_hex( | ||
self.heavy_hex_size[n_qubits] | ||
) | ||
|
||
def time_sabre_swap(self, _n_qubits, heuristic): | ||
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1) | ||
pass_.run(self.dag) | ||
|
||
def track_depth_sabre_swap(self, _n_qubits, heuristic): | ||
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1) | ||
return pass_.run(self.dag).depth() | ||
|
||
def track_size_sabre_swap(self, _n_qubits, heuristic): | ||
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1) | ||
return pass_.run(self.dag).size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could limit the runtime here quite a bit I think if we split this up into two classes one for timed benchmarks and one for tracking benchmarks. The tracking benchmarks could call sabre
in setup and then just run depth()
and size()
on the output. Something like:
class LargeQFTMappingBenchTracking:
timeout = 600.0 # seconds
heavy_hex_size = {115: 7, 409: 13, 1081: 21}
params = ([115, 409, 1081], ["lookahead", "decay"])
param_names = ["n_qubits", "heuristic"]
def setup(self, n_qubits, heuristic):
qr = QuantumRegister(n_qubits, name="q")
self.dag = circuit_to_dag(build_model_circuit(qr))
self.coupling = CouplingMap.from_heavy_hex(
self.heavy_hex_size[n_qubits]
)
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
self.out_dag = pass_.run(self.dag)
def track_depth_sabre_swap(self, _n_qubits, _heuristic):
return self.out_dag.depth()
def track_size_sabre_swap(self, _n_qubits, _heuristic):
return self.out_dag.size()
That way we basically eliminate a bunch of duplicate sabre runs, but it does seem a bit hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine to me as a workaround for a deficiency in asv. I'll push a commit to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on further thought, this doesn't actually help in the way we wanted. The setup method is called before every parametrised benchmark, so this doesn't reduce the number of runs. What we can do instead is to define a setup_cache
function that creates all the DAGs and calculates the trackers we care about. That "state" object then gets fed into each of the parametrised benchmarks, and we just extract the value we care about to return immediately.
I've done something to this effect in e7c3df9. The result is that asv sits in the "set up" state before the benchmark for quite a long time, but then the benchmarks themselves return instantly (so it's clear that the cache is correctly being reused).
The tracking benchmarks here naively require a recomputation of the expensive swap-mapping, despite use wanting to just reuse things we already calculated during the timing phase. `asv` doesn't let us return trackers from the timing benchmarks directly, but we can still reduce one load of redundancy by pre-calculating all the tracker properties we care about only once in the cached setup method, and then just feeding that state into the actual benchmarks to retrieve the results they care about. This is rather hacky, but does successfully work around functionality we would like in `asv` to reduce runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making the update, in my local test it saved ~5min of execution time
…metapackage#1622) * Add benchmarks for Sabre on large QFT and QV circuits Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks. * Fix lint * Fix lint properly * Precalculate trackers to avoid recomputation The tracking benchmarks here naively require a recomputation of the expensive swap-mapping, despite use wanting to just reuse things we already calculated during the timing phase. `asv` doesn't let us return trackers from the timing benchmarks directly, but we can still reduce one load of redundancy by pre-calculating all the tracker properties we care about only once in the cached setup method, and then just feeding that state into the actual benchmarks to retrieve the results they care about. This is rather hacky, but does successfully work around functionality we would like in `asv` to reduce runtime.
…metapackage#1622) * Add benchmarks for Sabre on large QFT and QV circuits Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks. * Fix lint * Fix lint properly * Precalculate trackers to avoid recomputation The tracking benchmarks here naively require a recomputation of the expensive swap-mapping, despite use wanting to just reuse things we already calculated during the timing phase. `asv` doesn't let us return trackers from the timing benchmarks directly, but we can still reduce one load of redundancy by pre-calculating all the tracker properties we care about only once in the cached setup method, and then just feeding that state into the actual benchmarks to retrieve the results they care about. This is rather hacky, but does successfully work around functionality we would like in `asv` to reduce runtime.
Summary
Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks.
Details and comments
These could be in either
mapping_passes.py
or the files I've put them in. It's not super clear where, but since I used the benchmark-internal constructors (to dodge issues with the Terra functions potentially changing in the future), it made some sense to put them in the structure-specific files.