Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate porting unitary synthesis to rust #8774

Open
2 of 6 tasks
Tracked by #10976
mtreinish opened this issue Sep 20, 2022 · 5 comments
Open
2 of 6 tasks
Tracked by #10976

Investigate porting unitary synthesis to rust #8774

mtreinish opened this issue Sep 20, 2022 · 5 comments
Assignees
Labels
mod: transpiler Issues and PRs related to Transpiler performance priority: medium Rust This PR or issue is related to Rust code in the repository short project type: epic A theme of work that contain sub-tasks type: feature request New feature or request
Milestone

Comments

@mtreinish
Copy link
Member

mtreinish commented Sep 20, 2022

What should we add?

For level 3 compilation now, the unitary synthesis as part of the 2q block optimization becomes a large bottleneck (this was mostly supposition on my part looking at the profile we spend more time in consolidate blocks than synthesis) since we've moved layout and routing mostly to multithreaded rust code. To look at further speeding this up we should look at whether we can accelerate the numeric component of:

https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/quantum_info/synthesis/two_qubit_decompose.py

and

https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/quantum_info/synthesis/one_qubit_decompose.py (this was done in #9185)

by moving it to rust. For things that are already in vectorized numpy functions the performance gains may not be that big. The other constraint is that linear algebra functions that depend on blas we'll probably still want to call to numpy (likely via python since the numpy c api doesn't expose a lot of the linalg functions which leverage blas) since we don't want the complexity of linking the rust binary against blas at build time.

I'm not actually sure how much it'll speed things up since I expect a large chunk of time is spent building circuits and manipulating them. Moving the numeric portion of the modules to rust won't really change because the circuit side is in python space and rust can't really accelerate those portion of the modules. This effort is not necessarily going to turn out to be worth it, but we can't really know until we give it a try.

Tasks

  1. Rust performance synthesis
    mtreinish
  2. Rust performance synthesis
    mtreinish
  3. Rust performance synthesis
    jlapeyre
  4. performance synthesis type: feature request
  5. Rust performance synthesis
  6. Rust performance synthesis
    ShellyGarion
@mtreinish mtreinish added type: feature request New feature or request performance short project Rust This PR or issue is related to Rust code in the repository labels Sep 20, 2022
@mtreinish mtreinish added this to the 0.23.0 milestone Sep 20, 2022
@mtreinish mtreinish added the mod: transpiler Issues and PRs related to Transpiler label Sep 20, 2022
@mtreinish
Copy link
Member Author

One thought I did have is that we could do some of the work in parallel if we allow the synthesis function to take a list of unitaries as an input we can process them in parallel and then just loop over the results to build the circuits. Even if the evaluation doesn't get much faster by moving to rust doing multiple things in parallel can still provide a good speedup.

@nonhermitian
Copy link
Contributor

The internals have quite a bit of intermediate numpy array creation and numerical operations. There is likely gains to be had by removing those. Not sure how big that is in the grand scheme of things though

@mtreinish mtreinish changed the title Port unitary synthesis to rust Investigate porting unitary synthesis to rust Sep 23, 2022
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Nov 22, 2022
This commit ports the per basis parameter calculation from python to
rust. In looking at the runtime performance regression caused by Qiskit#8917
the majority is just that we're doing more work synthesizing to more
available basis to potentially produce better quality results. Profiling
the transpiler pass shows we're spending a non-trivial amount of time in
numpy/scipy (depending on whether it's before or after Qiskit#9179) computing
the determinant of the unitary. This is likely because those determinant
functions are designed to work with an arbitrarily large square matrix
while for the 1 qubit decomposer we're only ever working with a 2x2.
To remove this overhead this commit writes a dedicated rust function to
compute the determinant of a 2x2 complex matrix and then also adds
dedicated functions to calculate the angles for given basis to rust
as we can easily just return the end result from rust.

Related Qiskit#8774
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Nov 22, 2022
This commit ports the per basis parameter calculation from python to
rust. In looking at the runtime performance regression caused by Qiskit#8917
the majority is just that we're doing more work synthesizing to more
available basis to potentially produce better quality results. Profiling
the transpiler pass shows we're spending a non-trivial amount of time in
numpy/scipy (depending on whether it's before or after Qiskit#9179) computing
the determinant of the unitary. This is likely because those determinant
functions are designed to work with an arbitrarily large square matrix
while for the 1 qubit decomposer we're only ever working with a 2x2.
To remove this overhead this commit writes a dedicated rust function to
compute the determinant of a 2x2 complex matrix and then also adds
dedicated functions to calculate the angles for given basis to rust
as we can easily just return the end result from rust.

Related Qiskit#8774
mergify bot pushed a commit that referenced this issue Dec 2, 2022
* Oxidize parameter calculation for OneQubitEulerDecomposer

This commit ports the per basis parameter calculation from python to
rust. In looking at the runtime performance regression caused by #8917
the majority is just that we're doing more work synthesizing to more
available basis to potentially produce better quality results. Profiling
the transpiler pass shows we're spending a non-trivial amount of time in
numpy/scipy (depending on whether it's before or after #9179) computing
the determinant of the unitary. This is likely because those determinant
functions are designed to work with an arbitrarily large square matrix
while for the 1 qubit decomposer we're only ever working with a 2x2.
To remove this overhead this commit writes a dedicated rust function to
compute the determinant of a 2x2 complex matrix and then also adds
dedicated functions to calculate the angles for given basis to rust
as we can easily just return the end result from rust.

Related #8774

* Eliminate python function for staticmethod definition

This commit removes one layer of function calls for the
OneQubitEulerDecomposer's staticmethods for calculating parameters.
Previously, there was a python function which called an inner rust
function, this eliminates one layer and attaches the rust function
directly as a static method to the class definition.
@mtreinish mtreinish modified the milestones: 0.23.0, 0.24.0 Jan 9, 2023
Cryoris pushed a commit to Cryoris/qiskit-terra that referenced this issue Jan 12, 2023
* Oxidize parameter calculation for OneQubitEulerDecomposer

This commit ports the per basis parameter calculation from python to
rust. In looking at the runtime performance regression caused by Qiskit#8917
the majority is just that we're doing more work synthesizing to more
available basis to potentially produce better quality results. Profiling
the transpiler pass shows we're spending a non-trivial amount of time in
numpy/scipy (depending on whether it's before or after Qiskit#9179) computing
the determinant of the unitary. This is likely because those determinant
functions are designed to work with an arbitrarily large square matrix
while for the 1 qubit decomposer we're only ever working with a 2x2.
To remove this overhead this commit writes a dedicated rust function to
compute the determinant of a 2x2 complex matrix and then also adds
dedicated functions to calculate the angles for given basis to rust
as we can easily just return the end result from rust.

Related Qiskit#8774

* Eliminate python function for staticmethod definition

This commit removes one layer of function calls for the
OneQubitEulerDecomposer's staticmethods for calculating parameters.
Previously, there was a python function which called an inner rust
function, this eliminates one layer and attaches the rust function
directly as a static method to the class definition.
@jlapeyre
Copy link
Contributor

Many of the vectorized numpy functions involve small vectors and matrices, with linear dimensions like 2 or 4. Often the python overhead for numpy calls dominates for these calculations.

@jlapeyre jlapeyre self-assigned this Feb 27, 2023
@mtreinish
Copy link
Member Author

It's worth pointing out since I originally opened this issue I was able to find some benefit to using Rust for the circuit construction too in the 1q decomposer in: #9578 and #9583 so we probably could follow a similar pattern if porting the 2q decomposer too.

@kdk kdk removed this from the 0.24.0 milestone Apr 5, 2023
@mtreinish
Copy link
Member Author

mtreinish commented Sep 19, 2023

The importance of this has risen a bit, after #8779 was fixed by #10365 and #10467 we're definitely bottlenecked on the performance of UnitarySynthesis when using optimization level 3 (along with commutative analysis performance).

mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 4, 2024
This commit is part 1 of migrating the default 2q unitary synthesis to leverage
parallel rust described in Qiskit#8774, the eventual goal is to be able to
run unitary synthesis in parallel for all the unitary matrices in a
circuit in a single call from the `UnitarySynthesis` pass. This commit
lays the initial groundwork for doing this by starting with the largest
piece of the default 2q unitary synthesis code, the
TwoQubitWeylDecomposition class. It migrates the entire class
to be a pyclass in rust. There is still a Python subclass for it that
handles the actual QuantumCircuit generation and also the string
representations which are dependent on circuit generation. The goal of
this is to keep the same basic algorithm in place but re-implement
as-is in Rust as a common starting point for eventual improvements to
the underlying algorithm as well as parallelizing the synthesis of
multiple 2q unitary matrices.
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 11, 2024
This commit is part 1 of migrating the default 2q unitary synthesis to leverage
parallel rust described in Qiskit#8774, the eventual goal is to be able to
run unitary synthesis in parallel for all the unitary matrices in a
circuit in a single call from the `UnitarySynthesis` pass. This commit
lays the initial groundwork for doing this by starting with the largest
piece of the default 2q unitary synthesis code, the
TwoQubitWeylDecomposition class. It migrates the entire class
to be a pyclass in rust. There is still a Python subclass for it that
handles the actual QuantumCircuit generation and also the string
representations which are dependent on circuit generation. The goal of
this is to keep the same basic algorithm in place but re-implement
as-is in Rust as a common starting point for eventual improvements to
the underlying algorithm as well as parallelizing the synthesis of
multiple 2q unitary matrices.
@mtreinish mtreinish self-assigned this Mar 13, 2024
@mtreinish mtreinish added this to the 1.1.0 milestone Mar 13, 2024
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 14, 2024
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 14, 2024
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
* Oxidize TwoQubitWeylDecomposition

This commit is part 1 of migrating the default 2q unitary synthesis to leverage
parallel rust described in #8774, the eventual goal is to be able to
run unitary synthesis in parallel for all the unitary matrices in a
circuit in a single call from the `UnitarySynthesis` pass. This commit
lays the initial groundwork for doing this by starting with the largest
piece of the default 2q unitary synthesis code, the
TwoQubitWeylDecomposition class. It migrates the entire class
to be a pyclass in rust. There is still a Python subclass for it that
handles the actual QuantumCircuit generation and also the string
representations which are dependent on circuit generation. The goal of
this is to keep the same basic algorithm in place but re-implement
as-is in Rust as a common starting point for eventual improvements to
the underlying algorithm as well as parallelizing the synthesis of
multiple 2q unitary matrices.

* Fix typo in closest_partial_swap()

This commit fixes a typo the formula in the function.
This is the same fix from #11953.

Co-authored-by: Shelly Garion <46566946+ShellyGarion@users.noreply.github.com>

* Run black and lint

* Fix potential imaginary component sign flip in determinant

* Run cargo fmt

* Simplify using numpy eigh example comment

* Add missing checks to decompose_two_qubit_product_gate()

* Use rng first trial from Python implementation

To aid in debugging and rule out rng differences causing different
results this commit switches the first iteration of the randomized loop
to have hard coded values that are identical to what the rng in numpy
was returning. It is very unlikely that this will have any impact
because the specific random numbers used shouldn't matter, this is
mostly to just rule it out as a possibility in debugging the remaining
test failures.

* Fix whitespace in error message

* Fix assert check

* Fix missing multiplier on specialized weyl gate

* Fix various mistakes

This commit fixes two fundamental issues in the code. The first is the
rz and ry matrix were being incorrectly constructed for a given angle.
This caused the specializations that were computing the 1q matrices in
the decomposition based on a product with these gates' matrices to
return invalid results. The second issue is for the MirrorControlledEquiv
specialization had the angles backwards for computing the matrix of the
rz gates used in the products for the 1q matrices:

`K1l = K1l @ Rz(K1r)` and `K1r = K1r @ Rz(K1l)` not
`K1l = K1l @ Rz(K1l)` and `K1r = K1r @ Rz(K1r)`

This was a typo from the original transcription.

* Add pickle serialization support for TwoQubitWeylDecomposition

* Remove duplicate conjs increment

* Fix fSim abmb equivalent specialization

* Add specialized test class back

* Use QiskitError for backwards compatibility

* Add release note

* Make explicit specialization private

* Use explicit inheritance from general decomposition (#23)

* Apply suggestions from code review

* Use smallvec for circuit sequence params and qubits

* Use const 2x2 identity array where possible

* circuit() and weyl_gate don't need a mutable self

* Remove unnecessary inline annotations

* Rename Specializations enum Specialization

* Add back specialize() method and deprecate

* Reorganize Python/Rust split to wrap rust instead of subclass

This commit reworks the handoff between python and rust space for the
TwoQubitWeylDecomposition class. Previously the Python side
TwoQubitWeylDecomposition was a subclass of the Rust struct/pyclass of
the same name. This was originally done to deduplicate the the getters
for the attributes. However, because of the overhead associated with
some of the rust getters and needing to do some import normalization to
a numpy array the deduplication wasn't worth the cost.

* Remove unecessary allocations

* Rename DEFAULT_ATOL to ANGLE_ZERO_EPSILON

* Stop obsessing over -0

* Handle enum to int conversion as method

* Cleanup decompose_two_qubit_product_gate()

* Use a trait for trace_to_fid()

* Use an enum instead of string for euler basis

* Fix release note typo

* Improve magic basis transformation functions

Co-authored-by: Jake Lishman <jake.lishman@ibm.com>

* Remove eigh() util function

* Revert unecessary changes to callers of TwoQubitWeylDecomposition

* Restore debug logging and add test assertions for it

* Update qiskit/synthesis/two_qubit/two_qubit_decompose.py

Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com>

* Add specialization to __str__

* Add previous specialized class docstrings as inline rust code comments

* Rename fSim variants and suprress rustc warning about camel case

* Update tests for correct specialization enum name

* Expose specialization enum via private class attr and use for __repr__

---------

Co-authored-by: Shelly Garion <46566946+ShellyGarion@users.noreply.github.com>
Co-authored-by: Jake Lishman <jake@binhbar.com>
Co-authored-by: Jake Lishman <jake.lishman@ibm.com>
Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com>
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 14, 2024
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
github-merge-queue bot pushed a commit that referenced this issue Mar 28, 2024
* Oxidize TwoQubitBasisDecomposer

This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in #8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of #11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on #11946 and will need to be rebased after #11946
is merged.

Fixes #12004

* Fix errors after rebase

* Fix traces method

* Fix pulse optimized synthesis

* Add release notes

* Fix lint

* Use consts for static decomposition arrays

* Run cargo fmt

* Handle basis_fidelity inside unitary synthesis path

* Cast input to TwoQubitBasisDecomposer.num_basis_gates

* Use statics instead of consts

* Pre-allocate 2q circuit sequence outside the pulse optimal path.
@mtreinish mtreinish modified the milestones: 1.1.0, 1.2.0 Apr 30, 2024
@ShellyGarion ShellyGarion added the type: epic A theme of work that contain sub-tasks label Jun 12, 2024
@mtreinish mtreinish modified the milestones: 1.2.0, 1.3 beta Jul 24, 2024
@mtreinish mtreinish modified the milestones: 1.3 beta, 1.3.0 Sep 10, 2024
@raynelfss raynelfss linked a pull request Oct 18, 2024 that will close this issue
4 tasks
@raynelfss raynelfss removed a link to a pull request Oct 18, 2024
4 tasks
@raynelfss raynelfss modified the milestones: 1.3.0, 2.0.0 Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mod: transpiler Issues and PRs related to Transpiler performance priority: medium Rust This PR or issue is related to Rust code in the repository short project type: epic A theme of work that contain sub-tasks type: feature request New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants