Skip to content
This repository has been archived by the owner on Aug 19, 2023. It is now read-only.

Add benchmarks for Sabre on large QFT and QV circuits #1622

Merged
merged 4 commits into from
Oct 28, 2022

Conversation

jakelishman
Copy link
Member

Summary

Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks.

Details and comments

These could be in either mapping_passes.py or the files I've put them in. It's not super clear where, but since I used the benchmark-internal constructors (to dodge issues with the Terra functions potentially changing in the future), it made some sense to put them in the structure-specific files.

Sabre is capable of handling these large benchmarks now, and it's of
interest for us to track our performance on large systems.  We don't
anticipate running on them yet, but we will want to know in the future
when further changes to routing and memory usage improve these
benchmarks.
@jakelishman jakelishman force-pushed the more-sabre-benchmarks branch from 66bf11f to 11df842 Compare October 27, 2022 13:15
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, but before I approve I'm going to spin up a run locally, but do you have a rough runtime estimate for these new benchmarks?

@jakelishman
Copy link
Member Author

On my machine with main Terra, the largest QFT took 55s and the longest QV took ~1m40s, I think, but I accidentally cleared the database with an overzealous git clean -fdx since I was having trouble running tox in a dirty project directory.

@jakelishman
Copy link
Member Author

jakelishman commented Oct 27, 2022

Ah, I still have the output in the scrollback of my terminal (before I fixed the QFT generation):

jake@ninetales$ asv dev --python 3.10 -b 'LargeQ.*MappingBench.*time'
· Fetching recent changes
· Creating environments
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] · For qiskit-terra commit aab18fcd <main>:
[  0.00%] ·· Benchmarking virtualenv-py3.10
[ 25.00%] ··· Running (qft.LargeQFTMappingBench.time_sabre_swap--).
[ 50.00%] ··· Running (quantum_volume.LargeQuantumVolumeMappingBenchmark.time_sabre_swap--).
[ 75.00%] ··· qft.LargeQFTMappingBench.time_sabre_swap                                                                                                                      2/6 failed
[ 75.00%] ··· ========== ============ ============
              --                 heuristic
              ---------- -------------------------
               n_qubits   lookahead      decay
              ========== ============ ============
                 115       404±40ms     380±9ms
                 409      6.59±0.08s   6.08±0.01s
                 1081       failed       failed
              ========== ============ ============

[100.00%] ··· quantum_volume.LargeQuantumVolumeMappingBenchmark.time_sabre_swap                                                                                                     ok
[100.00%] ··· ========== ================ ============ ================= =============
              --                               depth / heuristic
              ---------- -------------------------------------------------------------
               n_qubits   10 / lookahead   10 / decay   100 / lookahead   100 / decay
              ========== ================ ============ ================= =============
                 115        104±0.9ms      98.9±0.7ms       969±10ms        973±7ms
                 409        21.3±0.4s      7.09±0.03s         n/a             n/a
                 1081       59.1±0.2s       1.53±0m           n/a             n/a
              ========== ================ ============ ================= =============

@mtreinish
Copy link
Member

I did a run locally, this adds ~30mins to local runtime (31:20.39 for my local asv run call without venv or build time) to the benchmarks for a commit. That's on the long side but considering we just reduced a lot of overhead from the assemble benchmarks and the importance of testing sabre at scale now I think that's ok. The results from my local run were:

[ 58.33%] ··· qft.LargeQFTMappingBench.time_sabre_swap                        ok
[ 58.33%] ··· ========== ============ ============
              --                 heuristic        
              ---------- -------------------------
               n_qubits   lookahead      decay    
              ========== ============ ============
                 115       307±1ms     296±0.9ms  
                 409       4.93±0s     4.73±0.01s 
                 1081     43.1±0.05s   40.6±0.05s 
              ========== ============ ============

[ 66.67%] ··· qft.LargeQFTMappingBench.track_depth_sabre_swap                 ok
[ 66.67%] ··· ========== =========== ========
              --              heuristic      
              ---------- --------------------
               n_qubits   lookahead   decay  
              ========== =========== ========
                 115         3834      2980  
                 409        26641     25214  
                 1081       150404    120263 
              ========== =========== ========

[ 75.00%] ··· qft.LargeQFTMappingBench.track_size_sabre_swap                  ok
[ 75.00%] ··· ========== =========== =========
              --               heuristic      
              ---------- ---------------------
               n_qubits   lookahead    decay  
              ========== =========== =========
                 115        19670      18634  
                 409        269456     263765 
                 1081      2043833    1959761 
              ========== =========== =========

[ 83.33%] ··· ...uantumVolumeMappingBenchmark.time_sabre_swap                 ok
[ 83.33%] ··· ========== ======= ============ ============
              --                         heuristic        
              ------------------ -------------------------
               n_qubits   depth   lookahead      decay    
              ========== ======= ============ ============
                 115        10    83.5±0.9ms   80.7±0.8ms 
                 115       100     816±20ms     799±3ms   
                 409        10    18.1±0.02s   5.95±0.02s 
                 409       100       n/a          n/a     
                 1081       10    49.1±0.2s     1.28±0m   
                 1081      100       n/a          n/a     
              ========== ======= ============ ============

[ 91.67%] ··· ...olumeMappingBenchmark.track_depth_sabre_swap                 ok
[ 91.67%] ··· ========== ======= =========== =======
              --                      heuristic     
              ------------------ -------------------
               n_qubits   depth   lookahead   decay 
              ========== ======= =========== =======
                 115        10       563       506  
                 115       100       5563      5274 
                 409        10       4371      3676 
                 409       100       n/a       n/a  
                 1081       10      13996     12841 
                 1081      100       n/a       n/a  
              ========== ======= =========== =======

[100.00%] ··· ...VolumeMappingBenchmark.track_size_sabre_swap                 ok
[100.00%] ··· ========== ======= =========== ========
              --                      heuristic      
              ------------------ --------------------
               n_qubits   depth   lookahead   decay  
              ========== ======= =========== ========
                 115        10       4511      4286  
                 115       100      44748     44298  
                 409        10      72015     60308  
                 409       100       n/a       n/a   
                 1081       10      244513    296402 
                 1081      100       n/a       n/a   
              ========== ======= =========== ========

@jakelishman
Copy link
Member Author

My hope is that Qiskit/qiskit#9012 will (in fairly short order) knock off a good amount of that time again, and make the cost something more like 10-15 minutes on a run.

mtreinish
mtreinish previously approved these changes Oct 27, 2022
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here LGTM I'm fine with merging this as is. It'd be really great if asv let us measure multiple values in a benchmark and also if we could do a combined timed tracking benchmark. I did leave an inline suggestion for reducing the runtime a bit but please feel free to ignore it and just tag this automerge if you prefer.

Comment on lines 66 to 90
class LargeQFTMappingBench:
timeout = 600.0 # seconds

heavy_hex_size = {115: 7, 409: 13, 1081: 21}
params = ([115, 409, 1081], ["lookahead", "decay"])
param_names = ["n_qubits", "heuristic"]

def setup(self, n_qubits, _heuristic):
qr = QuantumRegister(n_qubits, name="q")
self.dag = circuit_to_dag(build_model_circuit(qr))
self.coupling = CouplingMap.from_heavy_hex(
self.heavy_hex_size[n_qubits]
)

def time_sabre_swap(self, _n_qubits, heuristic):
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
pass_.run(self.dag)

def track_depth_sabre_swap(self, _n_qubits, heuristic):
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
return pass_.run(self.dag).depth()

def track_size_sabre_swap(self, _n_qubits, heuristic):
pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
return pass_.run(self.dag).size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could limit the runtime here quite a bit I think if we split this up into two classes one for timed benchmarks and one for tracking benchmarks. The tracking benchmarks could call sabre in setup and then just run depth() and size() on the output. Something like:

class LargeQFTMappingBenchTracking:
    timeout = 600.0  # seconds

    heavy_hex_size = {115: 7, 409: 13, 1081: 21}
    params = ([115, 409, 1081], ["lookahead", "decay"])
    param_names = ["n_qubits", "heuristic"]

    def setup(self, n_qubits, heuristic):
        qr = QuantumRegister(n_qubits, name="q")
        self.dag = circuit_to_dag(build_model_circuit(qr))
        self.coupling = CouplingMap.from_heavy_hex(
            self.heavy_hex_size[n_qubits]
        )
        pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
        self.out_dag = pass_.run(self.dag)

    def track_depth_sabre_swap(self, _n_qubits, _heuristic):
        return self.out_dag.depth()

    def track_size_sabre_swap(self, _n_qubits, _heuristic):
        return self.out_dag.size()

That way we basically eliminate a bunch of duplicate sabre runs, but it does seem a bit hacky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me as a workaround for a deficiency in asv. I'll push a commit to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on further thought, this doesn't actually help in the way we wanted. The setup method is called before every parametrised benchmark, so this doesn't reduce the number of runs. What we can do instead is to define a setup_cache function that creates all the DAGs and calculates the trackers we care about. That "state" object then gets fed into each of the parametrised benchmarks, and we just extract the value we care about to return immediately.

I've done something to this effect in e7c3df9. The result is that asv sits in the "set up" state before the benchmark for quite a long time, but then the benchmarks themselves return instantly (so it's clear that the cache is correctly being reused).

The tracking benchmarks here naively require a recomputation of the
expensive swap-mapping, despite use wanting to just reuse things we
already calculated during the timing phase.  `asv` doesn't let us return
trackers from the timing benchmarks directly, but we can still reduce
one load of redundancy by pre-calculating all the tracker properties we
care about only once in the cached setup method, and then just feeding
that state into the actual benchmarks to retrieve the results they care
about.

This is rather hacky, but does successfully work around functionality we
would like in `asv` to reduce runtime.
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the update, in my local test it saved ~5min of execution time

@mtreinish mtreinish added the automerge This PR will automatically merge once its CI has passed label Oct 28, 2022
@mergify mergify bot merged commit db24b4b into Qiskit:master Oct 28, 2022
@jakelishman jakelishman deleted the more-sabre-benchmarks branch October 28, 2022 16:17
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Aug 1, 2023
…metapackage#1622)

* Add benchmarks for Sabre on large QFT and QV circuits

Sabre is capable of handling these large benchmarks now, and it's of
interest for us to track our performance on large systems.  We don't
anticipate running on them yet, but we will want to know in the future
when further changes to routing and memory usage improve these
benchmarks.

* Fix lint

* Fix lint properly

* Precalculate trackers to avoid recomputation

The tracking benchmarks here naively require a recomputation of the
expensive swap-mapping, despite use wanting to just reuse things we
already calculated during the timing phase.  `asv` doesn't let us return
trackers from the timing benchmarks directly, but we can still reduce
one load of redundancy by pre-calculating all the tracker properties we
care about only once in the cached setup method, and then just feeding
that state into the actual benchmarks to retrieve the results they care
about.

This is rather hacky, but does successfully work around functionality we
would like in `asv` to reduce runtime.
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Aug 11, 2023
…metapackage#1622)

* Add benchmarks for Sabre on large QFT and QV circuits

Sabre is capable of handling these large benchmarks now, and it's of
interest for us to track our performance on large systems.  We don't
anticipate running on them yet, but we will want to know in the future
when further changes to routing and memory usage improve these
benchmarks.

* Fix lint

* Fix lint properly

* Precalculate trackers to avoid recomputation

The tracking benchmarks here naively require a recomputation of the
expensive swap-mapping, despite use wanting to just reuse things we
already calculated during the timing phase.  `asv` doesn't let us return
trackers from the timing benchmarks directly, but we can still reduce
one load of redundancy by pre-calculating all the tracker properties we
care about only once in the cached setup method, and then just feeding
that state into the actual benchmarks to retrieve the results they care
about.

This is rather hacky, but does successfully work around functionality we
would like in `asv` to reduce runtime.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
automerge This PR will automatically merge once its CI has passed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants