-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rust quantum volume function #13238
Conversation
This commit adds a new function quantum_volume used for generating a quantum volume model circuit. This new function is defined in Rust and multithreaded to improve the throughput of the circuit generation. This new function will eventually replace the existing QuantumVolume class as part of Qiskit#13046. Since quantum volume is a circuit defined by it's structure using a generator function is inline with the goals of Qiskit#13046. Right now the performance is bottlenecked by the creation of the UnitaryGate objects as these are still defined solely in Python. We'll likely need to port the class to have a rust native representation to further speed up the construction of the circuit.
One or more of the following people are relevant to this code:
|
I ran some benchmarks comparing the construction speed of the quantum volume circuit using a logarithmic sweep from 10 to 1000 qubits (with equal depth for each point) using this function in parallel and serially (by setting All these results were on my laptop with a AMD Ryzen 7 PRO 7840U. There is potentially some tuning we might want to do about the number of threads and the chunk size for each parallel worker. But this seems like a reasonable balance and nothing I've changed so far appreciably changed the performance. It might be worth testing for the serial path though if doing the unitary generation all up front, like what is done in the parallel case, improves performance. |
Pull Request Test Coverage Report for Build 11155708868Details
💛 - Coveralls |
let qr = z.view().into_faer_complex().qr(); | ||
let r = qr.compute_r().as_ref().into_ndarray_complex().to_owned(); | ||
let mut d = r.into_diag(); | ||
d.mapv_inplace(|d| d / d.norm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this re-compute the norm of d
in every application? If yes would it make sense to pre-compute d.norm()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does an in-place substitution on the diagonal d
so that each value also named d
(which is probably why this is confusing) is replaced with d / d.norm()
for each element d
. I'll rename the diagonal to diag
or something to make it clear which is an element and which is the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if I should rewrite this using fixed size arrays or nalgebra's Matrix4<Complex64>
to make this all faster. Right now I'm pretty confident the overhead is mostly dealing with allocations for 2x2 arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this here: 32b4c1f the code is a lot more verbose now, but I think it's a bit more explicit and it'll be a bit more optimized (although the random unitary construction is not the bottleneck).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! The bottleneck is the UnitaryGate
creation Python-side as you wrote above, right? I think a 10x speedup (maybe more now with the explicit matrix allocation?) is already great, plus we have the circuit as function reaching the goal of #13046 😄 But if you'd like to test with pre-generating the gates I'm also happy to wait 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we are totally bottlenecked by the UnitaryGate
object creation, I ran a profile with the current state of the PR running the test script I was using to generate the data for the graph I shared above. In serial mode (py-spy doesn't handle the multithreading well) and the UnitaryGate
object creation is taking ~58% of the time, while the random unitary generation is only ~30% of the runtime (almost all of that is performing the qr factorization).
I'll do a quick test of serial mode dumping the unitaries to a vec first to see if that makes a difference performance wise (I don't expect it to), and will leave a comment here with the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Julien Gacon <gaconju@gmail.com>
The previous commit was relying on the behavior of the annotations future import but neglected to add it. This commit corrects the oversight.
The previous implementation had 4 heap allocations for each random unitary constructed, this commit uses some fixed sized stack allocated arrays and reduces that to two allocations one for q and r from the factorization. We'll always need at least one for the `Array2` that gets stored in each `UnitaryGate` as a numpy array. But to reduce to just this we'll need a method of computing the QR factorization without an allocation for the result space, nalgebtra might be a path for doing that. While this currently isn't a bottleneck as the `UnitaryGate` python object creation is the largest source of runtime, but assuming that's fixed in the future this might have a larger impact.
When executing in the serial path we previously were working directly with an iterator where the 2q unitaries we're created on the iterator that we passed directly to circuit constructor. However testing shows that precomputing all the unitaries into a Vec and passing the iterator off of that to the circuit constructor is marginally but consistently faster. So this commit pivots to using that instead.
This commit fixes two issues in the reproducibility of the quantum volume circuit. The first was the output unitary matrices for a fixed seed would differ between the parallel and serial execution path. This was because how the RNGs were used was different in the different code paths. This change results in the serial path being marginally less efficient, but it shouldn't be a big deal when compared to getting different results in different contexts. The second was the seed usage in parallel mode was dependent on the number of threads on the local system. This was problematic because the exact circuit generated between two systems would be different even with a fixed seed. This was fixed to avoid depending on the number of threads to determine how the seeds were used across multiple threads. The last fix here was a change to the error handling so that the CircuitData constructor used to create the circuit object can handle a fallible iterator. Previously we were throwing away the python error and panicking if the Python call to generate the UnitaryGate object raised an error for any reason.
Co-authored-by: Julien Gacon <gaconju@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for all the detailed benchmarks! 😄
* Add Rust quantum volume function This commit adds a new function quantum_volume used for generating a quantum volume model circuit. This new function is defined in Rust and multithreaded to improve the throughput of the circuit generation. This new function will eventually replace the existing QuantumVolume class as part of Qiskit#13046. Since quantum volume is a circuit defined by it's structure using a generator function is inline with the goals of Qiskit#13046. Right now the performance is bottlenecked by the creation of the UnitaryGate objects as these are still defined solely in Python. We'll likely need to port the class to have a rust native representation to further speed up the construction of the circuit. * Adjust type hints on python function Co-authored-by: Julien Gacon <gaconju@gmail.com> * Add missing __future__ import The previous commit was relying on the behavior of the annotations future import but neglected to add it. This commit corrects the oversight. * Add comment on random unitary algorithm * Reduce allocations random_unitaries The previous implementation had 4 heap allocations for each random unitary constructed, this commit uses some fixed sized stack allocated arrays and reduces that to two allocations one for q and r from the factorization. We'll always need at least one for the `Array2` that gets stored in each `UnitaryGate` as a numpy array. But to reduce to just this we'll need a method of computing the QR factorization without an allocation for the result space, nalgebtra might be a path for doing that. While this currently isn't a bottleneck as the `UnitaryGate` python object creation is the largest source of runtime, but assuming that's fixed in the future this might have a larger impact. * Preallocate unitaries for serial path When executing in the serial path we previously were working directly with an iterator where the 2q unitaries we're created on the iterator that we passed directly to circuit constructor. However testing shows that precomputing all the unitaries into a Vec and passing the iterator off of that to the circuit constructor is marginally but consistently faster. So this commit pivots to using that instead. * Fix determinism and error handling of of qv function This commit fixes two issues in the reproducibility of the quantum volume circuit. The first was the output unitary matrices for a fixed seed would differ between the parallel and serial execution path. This was because how the RNGs were used was different in the different code paths. This change results in the serial path being marginally less efficient, but it shouldn't be a big deal when compared to getting different results in different contexts. The second was the seed usage in parallel mode was dependent on the number of threads on the local system. This was problematic because the exact circuit generated between two systems would be different even with a fixed seed. This was fixed to avoid depending on the number of threads to determine how the seeds were used across multiple threads. The last fix here was a change to the error handling so that the CircuitData constructor used to create the circuit object can handle a fallible iterator. Previously we were throwing away the python error and panicking if the Python call to generate the UnitaryGate object raised an error for any reason. * Mention the new function is multithreaded in docstring * Update qiskit/circuit/library/quantum_volume.py Co-authored-by: Julien Gacon <gaconju@gmail.com> --------- Co-authored-by: Julien Gacon <gaconju@gmail.com> Co-authored-by: Julien Gacon <jules.gacon@googlemail.com>
In Qiskit#13238 we added a new function quantum_volume for generating a quantum volume model circuit with the eventual goal of replacing the existing QuantumVolume class. This new function is ~10x faster to generate the circuit than the existing python class. This commit builds off of that to internally call the new function for generating the circuit from the class. While the plan is to deprecate and remove the class for Qiskit 2.0 until that time we can give a performance boost to users of it for the lifespan of the 1.x release series.
* Add Rust quantum volume function This commit adds a new function quantum_volume used for generating a quantum volume model circuit. This new function is defined in Rust and multithreaded to improve the throughput of the circuit generation. This new function will eventually replace the existing QuantumVolume class as part of Qiskit#13046. Since quantum volume is a circuit defined by it's structure using a generator function is inline with the goals of Qiskit#13046. Right now the performance is bottlenecked by the creation of the UnitaryGate objects as these are still defined solely in Python. We'll likely need to port the class to have a rust native representation to further speed up the construction of the circuit. * Adjust type hints on python function Co-authored-by: Julien Gacon <gaconju@gmail.com> * Add missing __future__ import The previous commit was relying on the behavior of the annotations future import but neglected to add it. This commit corrects the oversight. * Add comment on random unitary algorithm * Reduce allocations random_unitaries The previous implementation had 4 heap allocations for each random unitary constructed, this commit uses some fixed sized stack allocated arrays and reduces that to two allocations one for q and r from the factorization. We'll always need at least one for the `Array2` that gets stored in each `UnitaryGate` as a numpy array. But to reduce to just this we'll need a method of computing the QR factorization without an allocation for the result space, nalgebtra might be a path for doing that. While this currently isn't a bottleneck as the `UnitaryGate` python object creation is the largest source of runtime, but assuming that's fixed in the future this might have a larger impact. * Preallocate unitaries for serial path When executing in the serial path we previously were working directly with an iterator where the 2q unitaries we're created on the iterator that we passed directly to circuit constructor. However testing shows that precomputing all the unitaries into a Vec and passing the iterator off of that to the circuit constructor is marginally but consistently faster. So this commit pivots to using that instead. * Fix determinism and error handling of of qv function This commit fixes two issues in the reproducibility of the quantum volume circuit. The first was the output unitary matrices for a fixed seed would differ between the parallel and serial execution path. This was because how the RNGs were used was different in the different code paths. This change results in the serial path being marginally less efficient, but it shouldn't be a big deal when compared to getting different results in different contexts. The second was the seed usage in parallel mode was dependent on the number of threads on the local system. This was problematic because the exact circuit generated between two systems would be different even with a fixed seed. This was fixed to avoid depending on the number of threads to determine how the seeds were used across multiple threads. The last fix here was a change to the error handling so that the CircuitData constructor used to create the circuit object can handle a fallible iterator. Previously we were throwing away the python error and panicking if the Python call to generate the UnitaryGate object raised an error for any reason. * Mention the new function is multithreaded in docstring * Update qiskit/circuit/library/quantum_volume.py Co-authored-by: Julien Gacon <gaconju@gmail.com> --------- Co-authored-by: Julien Gacon <gaconju@gmail.com> Co-authored-by: Julien Gacon <jules.gacon@googlemail.com>
* Use Rust generator function for QuantumVolume class In #13238 we added a new function quantum_volume for generating a quantum volume model circuit with the eventual goal of replacing the existing QuantumVolume class. This new function is ~10x faster to generate the circuit than the existing python class. This commit builds off of that to internally call the new function for generating the circuit from the class. While the plan is to deprecate and remove the class for Qiskit 2.0 until that time we can give a performance boost to users of it for the lifespan of the 1.x release series. * Use seed_name in string generation * Fix rng call
Summary
This commit adds a new function quantum_volume used for generating a quantum volume model circuit. This new function is defined in Rust and multithreaded to improve the throughput of the circuit generation. This new function will eventually replace the existing QuantumVolume class as part of #13046. Since quantum volume is a circuit defined by it's structure using a generator function is inline with the goals of #13046.
Right now the performance is bottlenecked by the creation of the UnitaryGate objects as these are still defined solely in Python. We'll likely need to port the class to have a rust native representation to further speed up the construction of the circuit.
Details and comments