Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup the CommutationChecker and the CommutativeCancellation #12859

Merged
merged 8 commits into from
Aug 1, 2024

Conversation

Cryoris
Copy link
Contributor

@Cryoris Cryoris commented Jul 30, 2024

Summary

Speedup the CommutationChecker and CommutativeCancellation. Note that this does not yet add support for parametrized gates.

Details

In CommutationChecker, we skip a set of checks by adding a set of pre-approved gates that we know we can compute the commutations of. We also add a feature to compute commutations only on a subset of gates, which CommutativeCancellation can use, as it only checks cancellations for certain gates.

Speeeeeed data

To check I was running the compilation of a 100-qubit QFT, given by

circuit = QFT(100).decompose()
basis_gates = ["id", "sx", "x", "rz", "cx"]
_ = transpile(circuit, basis_gates=basis_gates, optimization_level=2)

and on my laptop the transpile time is reduced by a factor of 2 (averaged over 10 repetitions):

(main) Took 4.818293190002441 +- 0.40586691417164533s
(this PR) Took 2.3364986181259155 +- 0.11652586081640869s

For more detailed information,

on main, the commutation analysis takes ~85% of the transpile time:
image
with this PR the time is still high, but reduced to ~66%:
image
Finally, asv is also happy, here testing the utility scale benchmarks (though I actually would've expected more improvement from the numbers above):
Benchmarks that have improved:

| Change   | Before [239a669a] <main>   | After [f32155a0] <commutation-checker-better~1>   |   Ratio | Benchmark (Parameter)                                              |
|----------|----------------------------|---------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 2.11±0.02s                 | 1.90±0.02s                                        |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')               |
| -        | 1.21±0.04s                 | 1.09±0.01s                                        |    0.9  | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
| -        | 83.4±1ms                   | 73.1±0.5ms                                        |    0.88 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')             |
| -        | 82.8±1ms                   | 72.9±0.2ms                                        |    0.88 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')             |
| -        | 84.8±0.9ms                 | 73.3±0.5ms                                        |    0.86 | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')            |
| -        | 1.80±0.03s                 | 1.49±0.01s                                        |    0.83 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')              |
| -        | 4.10±0.04s                 | 3.36±0.04s                                        |    0.82 | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                 |

Benchmarks that have stayed the same:

| Change   | Before [239a669a] <main>   | After [f32155a0] <commutation-checker-better~1>   | Ratio   | Benchmark (Parameter)                                                         |
|----------|----------------------------|---------------------------------------------------|---------|-------------------------------------------------------------------------------|
|          | 7.37±0.2s                  | 5.66±0.06s                                        | ~0.77   | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                            |
|          | 5.57±0.08s                 | 3.59±0.02s                                        | ~0.64   | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                           |
|          | 0                          | 0                                                 | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                 |
|          | 0                          | 0                                                 | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                 |
|          | 0                          | 0                                                 | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                |
|          | 72.6±0.3ms                 | 73.8±0.7ms                                        | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
|          | 72.9±0.7ms                 | 73.8±2ms                                          | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
|          | 23.3±0.3ms                 | 23.5±0.5ms                                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
|          | 23.2±0.4ms                 | 23.4±0.2ms                                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
|          | 23.3±0.2ms                 | 23.6±0.2ms                                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
|          | 3.77±0.02s                 | 3.76±0.01s                                        | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                      |
|          | 6.85±0.05ms                | 6.85±0.07ms                                       | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
|          | 73.5±0.4ms                 | 73.5±2ms                                          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
|          | 395                        | 395                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                 |
|          | 397                        | 397                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                 |
|          | 397                        | 397                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                |
|          | 300                        | 300                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                |
|          | 300                        | 300                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                |
|          | 300                        | 300                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')               |
|          | 1483                       | 1483                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                   |
|          | 1488                       | 1488                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                   |
|          | 1488                       | 1488                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                  |
|          | 1954                       | 1954                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                    |
|          | 1954                       | 1954                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                    |
|          | 1954                       | 1954                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                   |
|          | 2538                       | 2538                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                     |
|          | 2538                       | 2538                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                     |
|          | 2538                       | 2538                                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                    |
|          | 435                        | 435                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')      |
|          | 435                        | 435                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')      |
|          | 435                        | 435                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')     |
|          | 3.77±0.04s                 | 3.74±0s                                           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                       |
|          | 3.87±0.01s                 | 3.82±0.02s                                        | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                       |
|          | 6.88±0.02ms                | 6.83±0.01ms                                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
|          | 6.88±0.08ms                | 6.81±0.03ms                                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
|          | 1.01±0.01s                 | 967±20ms                                          | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |
|          | 15.6±0.04s                 | 15.0±0.08s                                        | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                           |
|          | 899±20ms                   | 866±6ms                                           | 0.96    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')             |
|          | 234±3ms                    | 222±2ms                                           | 0.95    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                        |
|          | 267±2ms                    | 253±1ms                                           | 0.95    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                        |
|          | 17.2±0.07s                 | 16.2±0.05s                                        | 0.94    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                           |
|          | 1.30±0.04s                 | 1.22±0.01s                                        | 0.93    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')             |
|          | 17.3±0.04s                 | 15.9±0.2s                                         | 0.92    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                          |
|          | 267±1ms                    | 244±1ms                                           | 0.91    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                       |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

MarcDrudis and others added 4 commits July 30, 2024 21:40
- list of pre-approved gates we know we support commutation on
- less redirections in function calls
- commutation analysis to only trigger search on gates that are actually cancelled
@Cryoris Cryoris added performance Changelog: New Feature Include in the "Added" section of the changelog labels Jul 30, 2024
@Cryoris Cryoris added this to the 1.2.0 milestone Jul 30, 2024
@Cryoris Cryoris requested a review from a team as a code owner July 30, 2024 21:08
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm very happy to see us working on reducing the overhead in the commutative cancellation pass. It is by far the slowest thing we run in the transpiler right now. Just a few small inline comments.

Cryoris added 2 commits July 31, 2024 10:22
-- these need updating only on the version with parameter support

self._z_rotations = {"p", "z", "u1", "rz", "t", "s"}
self._x_rotations = {"x", "rx"}
self._gates = {"cx", "cy", "cz", "h", "y"} # Now the gates supported are hard-coded
Copy link
Member

@ShellyGarion ShellyGarion Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great PR!

Does this pass only checks self commuting gates? or also pairs of gates, for example:

  • Any 1q gate in _z_rotations commutes with the control of CX
  • Any 1q gate in _x_rotations commutes with the target of CX
  • Any 1q gate in _z_rotations commutes with the control and target of CZ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding CX, the pass does the following:

          ┌───┐   ┌───┐   ┌─────────┐
q_0: ──■──┤ S ├───┤ T ├───┤ Rz(2.1) ├──■──
     ┌─┴─┐├───┤┌──┴───┴──┐└─────────┘┌─┴─┐
q_1: ┤ X ├┤ X ├┤ Rx(1.2) ├───────────┤ X ├
     └───┘└───┘└─────────┘           └───┘
     ┌────────────┐
q_0: ┤ Rz(4.4562) ├
     ├────────────┤
q_1: ┤ Rx(4.3416) ├
     └────────────┘

and then same for CZ. However, the pass could use some updating as it cannot handle e.g. Sdg or Tdg gates, meaning that the following circuit is unaffected by the pass

          ┌─────┐
q_0: ──■──┤ Sdg ├──■──
     ┌─┴─┐└─────┘┌─┴─┐
q_1: ┤ X ├───────┤ X ├
     └───┘       └───┘

In practice that is caught by re-synthesizing 2q unitaries but we could do it cheaper with the commutative cancellation. Let's do that in a follow up and also add support for SX gates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to add sdg and tdg to self._z_rotations ?

Copy link
Contributor Author

@Cryoris Cryoris Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what to do, but I'd rather do it in a follow up and not change the functionality so close to the 1.2 release (this PR should only affect runtime). I can open the follow-up the next days and we can discuss separately 🙂

@coveralls
Copy link

coveralls commented Jul 31, 2024

Pull Request Test Coverage Report for Build 10194702496

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 32 of 32 (100.0%) changed or added relevant lines in 3 files are covered.
  • 312 unchanged lines in 26 files lost coverage.
  • Overall coverage increased (+0.08%) to 89.715%

Files with Coverage Reduction New Missed Lines %
qiskit/compiler/transpiler.py 1 92.39%
qiskit/primitives/backend_sampler.py 1 98.86%
qiskit/transpiler/passes/basis/basis_translator.py 1 97.44%
qiskit/synthesis/two_qubit/xx_decompose/decomposer.py 1 95.42%
qiskit/providers/backend_compat.py 2 89.73%
qiskit/providers/fake_provider/fake_qasm_backend.py 2 95.56%
qiskit/circuit/commutation_checker.py 2 96.99%
crates/circuit/src/imports.rs 2 77.78%
qiskit/utils/deprecation.py 3 97.79%
qiskit/providers/models/backendproperties.py 3 95.36%
Totals Coverage Status
Change from base Build 10167476193: 0.08%
Covered Lines: 67161
Relevant Lines: 74860

💛 - Coveralls

@mtreinish mtreinish added Changelog: None Do not include in changelog and removed Changelog: New Feature Include in the "Added" section of the changelog labels Jul 31, 2024
mtreinish
mtreinish previously approved these changes Jul 31, 2024
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM, I have one inline question about a change to the tests. But I'm not super worried about it, feel free to enqueue for merge after answering it (unless it was a mistake). This is a good improvement, I think there might be some other low hanging fruit we can pick up on after merging this, in addition to the capabilities improvements discussed in the comments already.

circuit.rz(-np.pi / 2, 0)

passmanager = PassManager(CommutativeInverseCancellation(matrix_based=matrix_based))
passmanager = PassManager(CommutativeInverseCancellation())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I get why you did this because a parameterized circuit can't use matrix based inverse cancellation. But I'm wondering how it passed before, and what this PR did to change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this should not have been changed! That was a leftover from when we added parameter support and I forgot to revert this for 1.2 🙂 Reverted in 91850b8.

@mtreinish mtreinish enabled auto-merge August 1, 2024 09:04
@mtreinish mtreinish added this pull request to the merge queue Aug 1, 2024
@mtreinish mtreinish added the stable backport potential The bug might be minimal and/or import enough to be port to stable label Aug 1, 2024
Merged via the queue into Qiskit:main with commit 441925e Aug 1, 2024
15 checks passed
mergify bot pushed a commit that referenced this pull request Aug 1, 2024
…12859)

* Faster commutation checker and analysis

- list of pre-approved gates we know we support commutation on
- less redirections in function calls
- commutation analysis to only trigger search on gates that are actually cancelled

* cleanup comments

* add reno

* review comments

* revert accidentially changed tests

-- these need updating only on the version with parameter support

* revert changes in test_comm_inv_canc

---------

Co-authored-by: MarcDrudis <MarcSanzDrudis@outlook.com>
(cherry picked from commit 441925e)
github-merge-queue bot pushed a commit that referenced this pull request Aug 1, 2024
…12859) (#12876)

* Faster commutation checker and analysis

- list of pre-approved gates we know we support commutation on
- less redirections in function calls
- commutation analysis to only trigger search on gates that are actually cancelled

* cleanup comments

* add reno

* review comments

* revert accidentially changed tests

-- these need updating only on the version with parameter support

* revert changes in test_comm_inv_canc

---------

Co-authored-by: MarcDrudis <MarcSanzDrudis@outlook.com>
(cherry picked from commit 441925e)

Co-authored-by: Julien Gacon <jules.gacon@googlemail.com>
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
…iskit#12859)

* Faster commutation checker and analysis

- list of pre-approved gates we know we support commutation on
- less redirections in function calls
- commutation analysis to only trigger search on gates that are actually cancelled

* cleanup comments

* add reno

* review comments

* revert accidentially changed tests

-- these need updating only on the version with parameter support

* revert changes in test_comm_inv_canc

---------

Co-authored-by: MarcDrudis <MarcSanzDrudis@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance stable backport potential The bug might be minimal and/or import enough to be port to stable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants