-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate porting unitary synthesis to rust #8774
Comments
One thought I did have is that we could do some of the work in parallel if we allow the synthesis function to take a list of unitaries as an input we can process them in parallel and then just loop over the results to build the circuits. Even if the evaluation doesn't get much faster by moving to rust doing multiple things in parallel can still provide a good speedup. |
The internals have quite a bit of intermediate numpy array creation and numerical operations. There is likely gains to be had by removing those. Not sure how big that is in the grand scheme of things though |
This commit ports the per basis parameter calculation from python to rust. In looking at the runtime performance regression caused by Qiskit#8917 the majority is just that we're doing more work synthesizing to more available basis to potentially produce better quality results. Profiling the transpiler pass shows we're spending a non-trivial amount of time in numpy/scipy (depending on whether it's before or after Qiskit#9179) computing the determinant of the unitary. This is likely because those determinant functions are designed to work with an arbitrarily large square matrix while for the 1 qubit decomposer we're only ever working with a 2x2. To remove this overhead this commit writes a dedicated rust function to compute the determinant of a 2x2 complex matrix and then also adds dedicated functions to calculate the angles for given basis to rust as we can easily just return the end result from rust. Related Qiskit#8774
This commit ports the per basis parameter calculation from python to rust. In looking at the runtime performance regression caused by Qiskit#8917 the majority is just that we're doing more work synthesizing to more available basis to potentially produce better quality results. Profiling the transpiler pass shows we're spending a non-trivial amount of time in numpy/scipy (depending on whether it's before or after Qiskit#9179) computing the determinant of the unitary. This is likely because those determinant functions are designed to work with an arbitrarily large square matrix while for the 1 qubit decomposer we're only ever working with a 2x2. To remove this overhead this commit writes a dedicated rust function to compute the determinant of a 2x2 complex matrix and then also adds dedicated functions to calculate the angles for given basis to rust as we can easily just return the end result from rust. Related Qiskit#8774
* Oxidize parameter calculation for OneQubitEulerDecomposer This commit ports the per basis parameter calculation from python to rust. In looking at the runtime performance regression caused by #8917 the majority is just that we're doing more work synthesizing to more available basis to potentially produce better quality results. Profiling the transpiler pass shows we're spending a non-trivial amount of time in numpy/scipy (depending on whether it's before or after #9179) computing the determinant of the unitary. This is likely because those determinant functions are designed to work with an arbitrarily large square matrix while for the 1 qubit decomposer we're only ever working with a 2x2. To remove this overhead this commit writes a dedicated rust function to compute the determinant of a 2x2 complex matrix and then also adds dedicated functions to calculate the angles for given basis to rust as we can easily just return the end result from rust. Related #8774 * Eliminate python function for staticmethod definition This commit removes one layer of function calls for the OneQubitEulerDecomposer's staticmethods for calculating parameters. Previously, there was a python function which called an inner rust function, this eliminates one layer and attaches the rust function directly as a static method to the class definition.
* Oxidize parameter calculation for OneQubitEulerDecomposer This commit ports the per basis parameter calculation from python to rust. In looking at the runtime performance regression caused by Qiskit#8917 the majority is just that we're doing more work synthesizing to more available basis to potentially produce better quality results. Profiling the transpiler pass shows we're spending a non-trivial amount of time in numpy/scipy (depending on whether it's before or after Qiskit#9179) computing the determinant of the unitary. This is likely because those determinant functions are designed to work with an arbitrarily large square matrix while for the 1 qubit decomposer we're only ever working with a 2x2. To remove this overhead this commit writes a dedicated rust function to compute the determinant of a 2x2 complex matrix and then also adds dedicated functions to calculate the angles for given basis to rust as we can easily just return the end result from rust. Related Qiskit#8774 * Eliminate python function for staticmethod definition This commit removes one layer of function calls for the OneQubitEulerDecomposer's staticmethods for calculating parameters. Previously, there was a python function which called an inner rust function, this eliminates one layer and attaches the rust function directly as a static method to the class definition.
Many of the vectorized numpy functions involve small vectors and matrices, with linear dimensions like 2 or 4. Often the python overhead for numpy calls dominates for these calculations. |
This commit is part 1 of migrating the default 2q unitary synthesis to leverage parallel rust described in Qiskit#8774, the eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a circuit in a single call from the `UnitarySynthesis` pass. This commit lays the initial groundwork for doing this by starting with the largest piece of the default 2q unitary synthesis code, the TwoQubitWeylDecomposition class. It migrates the entire class to be a pyclass in rust. There is still a Python subclass for it that handles the actual QuantumCircuit generation and also the string representations which are dependent on circuit generation. The goal of this is to keep the same basic algorithm in place but re-implement as-is in Rust as a common starting point for eventual improvements to the underlying algorithm as well as parallelizing the synthesis of multiple 2q unitary matrices.
This commit is part 1 of migrating the default 2q unitary synthesis to leverage parallel rust described in Qiskit#8774, the eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a circuit in a single call from the `UnitarySynthesis` pass. This commit lays the initial groundwork for doing this by starting with the largest piece of the default 2q unitary synthesis code, the TwoQubitWeylDecomposition class. It migrates the entire class to be a pyclass in rust. There is still a Python subclass for it that handles the actual QuantumCircuit generation and also the string representations which are dependent on circuit generation. The goal of this is to keep the same basic algorithm in place but re-implement as-is in Rust as a common starting point for eventual improvements to the underlying algorithm as well as parallelizing the synthesis of multiple 2q unitary matrices.
This commit is the second part of migrating the default 2q unitary synthesis method to leverage parallel rust as described in Qiskit#8774. The Eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a single call from the `UnitarySynthesis` pass. The TwoQubitBasisDecomposer class is one of the default decomposers used by the unitary synthesis plugin. After this we can build an interface that will run the decomposition in parallel for a given decomposer. This commit re-implements the TwoQubitBasisDecomposer class in rust. It keeps the same algorithm from the previous python version but implements it in rust. This builds off of Qiskit#11946 and for the operation of the decomposer class the TwoQubitWeylDecomposition class is used solely through rust. This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946 is merged. Fixes Qiskit#12004
This commit is the second part of migrating the default 2q unitary synthesis method to leverage parallel rust as described in Qiskit#8774. The Eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a single call from the `UnitarySynthesis` pass. The TwoQubitBasisDecomposer class is one of the default decomposers used by the unitary synthesis plugin. After this we can build an interface that will run the decomposition in parallel for a given decomposer. This commit re-implements the TwoQubitBasisDecomposer class in rust. It keeps the same algorithm from the previous python version but implements it in rust. This builds off of Qiskit#11946 and for the operation of the decomposer class the TwoQubitWeylDecomposition class is used solely through rust. This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946 is merged. Fixes Qiskit#12004
* Oxidize TwoQubitWeylDecomposition This commit is part 1 of migrating the default 2q unitary synthesis to leverage parallel rust described in #8774, the eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a circuit in a single call from the `UnitarySynthesis` pass. This commit lays the initial groundwork for doing this by starting with the largest piece of the default 2q unitary synthesis code, the TwoQubitWeylDecomposition class. It migrates the entire class to be a pyclass in rust. There is still a Python subclass for it that handles the actual QuantumCircuit generation and also the string representations which are dependent on circuit generation. The goal of this is to keep the same basic algorithm in place but re-implement as-is in Rust as a common starting point for eventual improvements to the underlying algorithm as well as parallelizing the synthesis of multiple 2q unitary matrices. * Fix typo in closest_partial_swap() This commit fixes a typo the formula in the function. This is the same fix from #11953. Co-authored-by: Shelly Garion <46566946+ShellyGarion@users.noreply.github.com> * Run black and lint * Fix potential imaginary component sign flip in determinant * Run cargo fmt * Simplify using numpy eigh example comment * Add missing checks to decompose_two_qubit_product_gate() * Use rng first trial from Python implementation To aid in debugging and rule out rng differences causing different results this commit switches the first iteration of the randomized loop to have hard coded values that are identical to what the rng in numpy was returning. It is very unlikely that this will have any impact because the specific random numbers used shouldn't matter, this is mostly to just rule it out as a possibility in debugging the remaining test failures. * Fix whitespace in error message * Fix assert check * Fix missing multiplier on specialized weyl gate * Fix various mistakes This commit fixes two fundamental issues in the code. The first is the rz and ry matrix were being incorrectly constructed for a given angle. This caused the specializations that were computing the 1q matrices in the decomposition based on a product with these gates' matrices to return invalid results. The second issue is for the MirrorControlledEquiv specialization had the angles backwards for computing the matrix of the rz gates used in the products for the 1q matrices: `K1l = K1l @ Rz(K1r)` and `K1r = K1r @ Rz(K1l)` not `K1l = K1l @ Rz(K1l)` and `K1r = K1r @ Rz(K1r)` This was a typo from the original transcription. * Add pickle serialization support for TwoQubitWeylDecomposition * Remove duplicate conjs increment * Fix fSim abmb equivalent specialization * Add specialized test class back * Use QiskitError for backwards compatibility * Add release note * Make explicit specialization private * Use explicit inheritance from general decomposition (#23) * Apply suggestions from code review * Use smallvec for circuit sequence params and qubits * Use const 2x2 identity array where possible * circuit() and weyl_gate don't need a mutable self * Remove unnecessary inline annotations * Rename Specializations enum Specialization * Add back specialize() method and deprecate * Reorganize Python/Rust split to wrap rust instead of subclass This commit reworks the handoff between python and rust space for the TwoQubitWeylDecomposition class. Previously the Python side TwoQubitWeylDecomposition was a subclass of the Rust struct/pyclass of the same name. This was originally done to deduplicate the the getters for the attributes. However, because of the overhead associated with some of the rust getters and needing to do some import normalization to a numpy array the deduplication wasn't worth the cost. * Remove unecessary allocations * Rename DEFAULT_ATOL to ANGLE_ZERO_EPSILON * Stop obsessing over -0 * Handle enum to int conversion as method * Cleanup decompose_two_qubit_product_gate() * Use a trait for trace_to_fid() * Use an enum instead of string for euler basis * Fix release note typo * Improve magic basis transformation functions Co-authored-by: Jake Lishman <jake.lishman@ibm.com> * Remove eigh() util function * Revert unecessary changes to callers of TwoQubitWeylDecomposition * Restore debug logging and add test assertions for it * Update qiskit/synthesis/two_qubit/two_qubit_decompose.py Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com> * Add specialization to __str__ * Add previous specialized class docstrings as inline rust code comments * Rename fSim variants and suprress rustc warning about camel case * Update tests for correct specialization enum name * Expose specialization enum via private class attr and use for __repr__ --------- Co-authored-by: Shelly Garion <46566946+ShellyGarion@users.noreply.github.com> Co-authored-by: Jake Lishman <jake@binhbar.com> Co-authored-by: Jake Lishman <jake.lishman@ibm.com> Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com>
This commit is the second part of migrating the default 2q unitary synthesis method to leverage parallel rust as described in Qiskit#8774. The Eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a single call from the `UnitarySynthesis` pass. The TwoQubitBasisDecomposer class is one of the default decomposers used by the unitary synthesis plugin. After this we can build an interface that will run the decomposition in parallel for a given decomposer. This commit re-implements the TwoQubitBasisDecomposer class in rust. It keeps the same algorithm from the previous python version but implements it in rust. This builds off of Qiskit#11946 and for the operation of the decomposer class the TwoQubitWeylDecomposition class is used solely through rust. This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946 is merged. Fixes Qiskit#12004
* Oxidize TwoQubitBasisDecomposer This commit is the second part of migrating the default 2q unitary synthesis method to leverage parallel rust as described in #8774. The Eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a single call from the `UnitarySynthesis` pass. The TwoQubitBasisDecomposer class is one of the default decomposers used by the unitary synthesis plugin. After this we can build an interface that will run the decomposition in parallel for a given decomposer. This commit re-implements the TwoQubitBasisDecomposer class in rust. It keeps the same algorithm from the previous python version but implements it in rust. This builds off of #11946 and for the operation of the decomposer class the TwoQubitWeylDecomposition class is used solely through rust. This commit depends on #11946 and will need to be rebased after #11946 is merged. Fixes #12004 * Fix errors after rebase * Fix traces method * Fix pulse optimized synthesis * Add release notes * Fix lint * Use consts for static decomposition arrays * Run cargo fmt * Handle basis_fidelity inside unitary synthesis path * Cast input to TwoQubitBasisDecomposer.num_basis_gates * Use statics instead of consts * Pre-allocate 2q circuit sequence outside the pulse optimal path.
What should we add?
For level 3 compilation now, the unitary synthesis as part of the 2q block optimization
becomes a large bottleneck(this was mostly supposition on my part looking at the profile we spend more time in consolidate blocks than synthesis) since we've moved layout and routing mostly to multithreaded rust code. To look at further speeding this up we should look at whether we can accelerate the numeric component of:https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/quantum_info/synthesis/two_qubit_decompose.py
and
https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/quantum_info/synthesis/one_qubit_decompose.py(this was done in #9185)by moving it to rust. For things that are already in vectorized numpy functions the performance gains may not be that big. The other constraint is that linear algebra functions that depend on blas we'll probably still want to call to numpy (likely via python since the numpy c api doesn't expose a lot of the linalg functions which leverage blas) since we don't want the complexity of linking the rust binary against blas at build time.
I'm not actually sure how much it'll speed things up since I expect a large chunk of time is spent building circuits and manipulating them. Moving the numeric portion of the modules to rust won't really change because the circuit side is in python space and rust can't really accelerate those portion of the modules. This effort is not necessarily going to turn out to be worth it, but we can't really know until we give it a try.
Tasks
TwoQubitControlledUDecomposer
to rust #12907The text was updated successfully, but these errors were encountered: