Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rust quantum volume function #13238

Merged
merged 12 commits into from
Oct 3, 2024
Merged

Add Rust quantum volume function #13238

merged 12 commits into from
Oct 3, 2024

Conversation

mtreinish
Copy link
Member

Summary

This commit adds a new function quantum_volume used for generating a quantum volume model circuit. This new function is defined in Rust and multithreaded to improve the throughput of the circuit generation. This new function will eventually replace the existing QuantumVolume class as part of #13046. Since quantum volume is a circuit defined by it's structure using a generator function is inline with the goals of #13046.

Right now the performance is bottlenecked by the creation of the UnitaryGate objects as these are still defined solely in Python. We'll likely need to port the class to have a rust native representation to further speed up the construction of the circuit.

Details and comments

This commit adds a new function quantum_volume used for generating a
quantum volume model circuit. This new function is defined in Rust and
multithreaded to improve the throughput of the circuit generation. This
new function will eventually replace the existing QuantumVolume class as
part of Qiskit#13046. Since quantum volume is a circuit defined by it's
structure using a generator function is inline with the goals of Qiskit#13046.

Right now the performance is bottlenecked by the creation of the
UnitaryGate objects as these are still defined solely in Python. We'll
likely need to port the class to have a rust native representation to
further speed up the construction of the circuit.
@mtreinish mtreinish added performance Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Sep 28, 2024
@mtreinish mtreinish added this to the 1.3.0 milestone Sep 28, 2024
@mtreinish mtreinish requested a review from a team as a code owner September 28, 2024 15:59
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia

@mtreinish
Copy link
Member Author

I ran some benchmarks comparing the construction speed of the quantum volume circuit using a logarithmic sweep from 10 to 1000 qubits (with equal depth for each point) using this function in parallel and serially (by setting QISKIT_IN_PARALLEL=TRUE) and comparing that against the Python QuantumVolume class:

construct_time

All these results were on my laptop with a AMD Ryzen 7 PRO 7840U.

There is potentially some tuning we might want to do about the number of threads and the chunk size for each parallel worker. But this seems like a reasonable balance and nothing I've changed so far appreciably changed the performance. It might be worth testing for the serial path though if doing the unitary generation all up front, like what is done in the parallel case, improves performance.

@coveralls
Copy link

coveralls commented Sep 28, 2024

Pull Request Test Coverage Report for Build 11155708868

Details

  • 136 of 143 (95.1%) changed or added relevant lines in 7 files are covered.
  • 11 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 88.87%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/circuit/library/quantum_volume.py 6 7 85.71%
crates/accelerate/src/circuit_library/quantum_volume.rs 113 119 94.96%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 5 92.48%
crates/qasm2/src/parse.rs 6 97.61%
Totals Coverage Status
Change from base Build 11152794761: 0.02%
Covered Lines: 74276
Relevant Lines: 83578

💛 - Coveralls

let qr = z.view().into_faer_complex().qr();
let r = qr.compute_r().as_ref().into_ndarray_complex().to_owned();
let mut d = r.into_diag();
d.mapv_inplace(|d| d / d.norm());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this re-compute the norm of d in every application? If yes would it make sense to pre-compute d.norm()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does an in-place substitution on the diagonal d so that each value also named d (which is probably why this is confusing) is replaced with d / d.norm() for each element d. I'll rename the diagonal to diag or something to make it clear which is an element and which is the array.

Copy link
Member Author

@mtreinish mtreinish Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if I should rewrite this using fixed size arrays or nalgebra's Matrix4<Complex64> to make this all faster. Right now I'm pretty confident the overhead is mostly dealing with allocations for 2x2 arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this here: 32b4c1f the code is a lot more verbose now, but I think it's a bit more explicit and it'll be a bit more optimized (although the random unitary construction is not the bottleneck).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The bottleneck is the UnitaryGate creation Python-side as you wrote above, right? I think a 10x speedup (maybe more now with the explicit matrix allocation?) is already great, plus we have the circuit as function reaching the goal of #13046 😄 But if you'd like to test with pre-generating the gates I'm also happy to wait 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we are totally bottlenecked by the UnitaryGate object creation, I ran a profile with the current state of the PR running the test script I was using to generate the data for the graph I shared above. In serial mode (py-spy doesn't handle the multithreading well) and the UnitaryGate object creation is taking ~58% of the time, while the random unitary generation is only ~30% of the runtime (almost all of that is performing the qr factorization).

I'll do a quick test of serial mode dumping the unitaries to a vec first to see if that makes a difference performance wise (I don't expect it to), and will leave a comment here with the results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

times

I reran the numbers collecting the unitaries into a vec and then iterating off of that instead of passing the iterator directly to the circuit constructor. It looks like it's marginally faster to collect into a vec.

qiskit/circuit/library/quantum_volume.py Outdated Show resolved Hide resolved
mtreinish and others added 6 commits September 30, 2024 11:10
Co-authored-by: Julien Gacon <gaconju@gmail.com>
The previous commit was relying on the behavior of the annotations
future import but neglected to add it. This commit corrects the
oversight.
The previous implementation had 4 heap allocations for each random
unitary constructed, this commit uses some fixed sized stack allocated
arrays and reduces that to two allocations one for q and r from the
factorization. We'll always need at least one for the `Array2` that gets
stored in each `UnitaryGate` as a numpy array. But to reduce to just
this we'll need a method of computing the QR factorization without an
allocation for the result space, nalgebtra might be a path for doing
that. While this currently isn't a bottleneck as the `UnitaryGate`
python object creation is the largest source of runtime, but assuming
that's fixed in the future this might have a larger impact.
When executing in the serial path we previously were working directly
with an iterator where the 2q unitaries we're created on the iterator
that we passed directly to circuit constructor. However testing shows
that precomputing all the unitaries into a Vec and passing the iterator
off of that to the circuit constructor is marginally but consistently
faster. So this commit pivots to using that instead.
mtreinish and others added 4 commits October 2, 2024 12:46
This commit fixes two issues in the reproducibility of the quantum
volume circuit. The first was the output unitary matrices for a fixed
seed would differ between the parallel and serial execution path. This
was because how the RNGs were used was different in the different code
paths. This change results in the serial path being marginally less
efficient, but it shouldn't be a big deal when compared to getting
different results in different contexts. The second was the seed usage
in parallel mode was dependent on the number of threads on the local
system. This was problematic because the exact circuit generated between
two systems would be different even with a fixed seed. This was fixed to
avoid depending on the number of threads to determine how the seeds were
used across multiple threads.

The last fix here was a change to the error handling so that the
CircuitData constructor used to create the circuit object can handle a
fallible iterator. Previously we were throwing away the python error and
panicking if the Python call to generate the UnitaryGate object raised
an error for any reason.
Co-authored-by: Julien Gacon <gaconju@gmail.com>
@mtreinish mtreinish requested a review from Cryoris October 2, 2024 17:54
Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for all the detailed benchmarks! 😄

@Cryoris Cryoris enabled auto-merge October 3, 2024 04:32
@Cryoris Cryoris added this pull request to the merge queue Oct 3, 2024
Merged via the queue into Qiskit:main with commit 642debf Oct 3, 2024
15 checks passed
@mtreinish mtreinish deleted the rust-qv branch October 3, 2024 09:30
ElePT pushed a commit to ElePT/qiskit that referenced this pull request Oct 3, 2024
* Add Rust quantum volume function

This commit adds a new function quantum_volume used for generating a
quantum volume model circuit. This new function is defined in Rust and
multithreaded to improve the throughput of the circuit generation. This
new function will eventually replace the existing QuantumVolume class as
part of Qiskit#13046. Since quantum volume is a circuit defined by it's
structure using a generator function is inline with the goals of Qiskit#13046.

Right now the performance is bottlenecked by the creation of the
UnitaryGate objects as these are still defined solely in Python. We'll
likely need to port the class to have a rust native representation to
further speed up the construction of the circuit.

* Adjust type hints on python function

Co-authored-by: Julien Gacon <gaconju@gmail.com>

* Add missing __future__ import

The previous commit was relying on the behavior of the annotations
future import but neglected to add it. This commit corrects the
oversight.

* Add comment on random unitary algorithm

* Reduce allocations random_unitaries

The previous implementation had 4 heap allocations for each random
unitary constructed, this commit uses some fixed sized stack allocated
arrays and reduces that to two allocations one for q and r from the
factorization. We'll always need at least one for the `Array2` that gets
stored in each `UnitaryGate` as a numpy array. But to reduce to just
this we'll need a method of computing the QR factorization without an
allocation for the result space, nalgebtra might be a path for doing
that. While this currently isn't a bottleneck as the `UnitaryGate`
python object creation is the largest source of runtime, but assuming
that's fixed in the future this might have a larger impact.

* Preallocate unitaries for serial path

When executing in the serial path we previously were working directly
with an iterator where the 2q unitaries we're created on the iterator
that we passed directly to circuit constructor. However testing shows
that precomputing all the unitaries into a Vec and passing the iterator
off of that to the circuit constructor is marginally but consistently
faster. So this commit pivots to using that instead.

* Fix determinism and error handling of of qv function

This commit fixes two issues in the reproducibility of the quantum
volume circuit. The first was the output unitary matrices for a fixed
seed would differ between the parallel and serial execution path. This
was because how the RNGs were used was different in the different code
paths. This change results in the serial path being marginally less
efficient, but it shouldn't be a big deal when compared to getting
different results in different contexts. The second was the seed usage
in parallel mode was dependent on the number of threads on the local
system. This was problematic because the exact circuit generated between
two systems would be different even with a fixed seed. This was fixed to
avoid depending on the number of threads to determine how the seeds were
used across multiple threads.

The last fix here was a change to the error handling so that the
CircuitData constructor used to create the circuit object can handle a
fallible iterator. Previously we were throwing away the python error and
panicking if the Python call to generate the UnitaryGate object raised
an error for any reason.

* Mention the new function is multithreaded in docstring

* Update qiskit/circuit/library/quantum_volume.py

Co-authored-by: Julien Gacon <gaconju@gmail.com>

---------

Co-authored-by: Julien Gacon <gaconju@gmail.com>
Co-authored-by: Julien Gacon <jules.gacon@googlemail.com>
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 4, 2024
In Qiskit#13238 we added a new function quantum_volume for generating a
quantum volume model circuit with the eventual goal of replacing the
existing QuantumVolume class. This new function is ~10x faster to
generate the circuit than the existing python class. This commit builds
off of that to internally call the new function for generating the
circuit from the class. While the plan is to deprecate and remove the
class for Qiskit 2.0 until that time we can give a performance boost to
users of it for the lifespan of the 1.x release series.
ElePT pushed a commit to ElePT/qiskit that referenced this pull request Oct 7, 2024
* Add Rust quantum volume function

This commit adds a new function quantum_volume used for generating a
quantum volume model circuit. This new function is defined in Rust and
multithreaded to improve the throughput of the circuit generation. This
new function will eventually replace the existing QuantumVolume class as
part of Qiskit#13046. Since quantum volume is a circuit defined by it's
structure using a generator function is inline with the goals of Qiskit#13046.

Right now the performance is bottlenecked by the creation of the
UnitaryGate objects as these are still defined solely in Python. We'll
likely need to port the class to have a rust native representation to
further speed up the construction of the circuit.

* Adjust type hints on python function

Co-authored-by: Julien Gacon <gaconju@gmail.com>

* Add missing __future__ import

The previous commit was relying on the behavior of the annotations
future import but neglected to add it. This commit corrects the
oversight.

* Add comment on random unitary algorithm

* Reduce allocations random_unitaries

The previous implementation had 4 heap allocations for each random
unitary constructed, this commit uses some fixed sized stack allocated
arrays and reduces that to two allocations one for q and r from the
factorization. We'll always need at least one for the `Array2` that gets
stored in each `UnitaryGate` as a numpy array. But to reduce to just
this we'll need a method of computing the QR factorization without an
allocation for the result space, nalgebtra might be a path for doing
that. While this currently isn't a bottleneck as the `UnitaryGate`
python object creation is the largest source of runtime, but assuming
that's fixed in the future this might have a larger impact.

* Preallocate unitaries for serial path

When executing in the serial path we previously were working directly
with an iterator where the 2q unitaries we're created on the iterator
that we passed directly to circuit constructor. However testing shows
that precomputing all the unitaries into a Vec and passing the iterator
off of that to the circuit constructor is marginally but consistently
faster. So this commit pivots to using that instead.

* Fix determinism and error handling of of qv function

This commit fixes two issues in the reproducibility of the quantum
volume circuit. The first was the output unitary matrices for a fixed
seed would differ between the parallel and serial execution path. This
was because how the RNGs were used was different in the different code
paths. This change results in the serial path being marginally less
efficient, but it shouldn't be a big deal when compared to getting
different results in different contexts. The second was the seed usage
in parallel mode was dependent on the number of threads on the local
system. This was problematic because the exact circuit generated between
two systems would be different even with a fixed seed. This was fixed to
avoid depending on the number of threads to determine how the seeds were
used across multiple threads.

The last fix here was a change to the error handling so that the
CircuitData constructor used to create the circuit object can handle a
fallible iterator. Previously we were throwing away the python error and
panicking if the Python call to generate the UnitaryGate object raised
an error for any reason.

* Mention the new function is multithreaded in docstring

* Update qiskit/circuit/library/quantum_volume.py

Co-authored-by: Julien Gacon <gaconju@gmail.com>

---------

Co-authored-by: Julien Gacon <gaconju@gmail.com>
Co-authored-by: Julien Gacon <jules.gacon@googlemail.com>
github-merge-queue bot pushed a commit that referenced this pull request Oct 16, 2024
* Use Rust generator function for QuantumVolume class

In #13238 we added a new function quantum_volume for generating a
quantum volume model circuit with the eventual goal of replacing the
existing QuantumVolume class. This new function is ~10x faster to
generate the circuit than the existing python class. This commit builds
off of that to internally call the new function for generating the
circuit from the class. While the plan is to deprecate and remove the
class for Qiskit 2.0 until that time we can give a performance boost to
users of it for the lifespan of the 1.x release series.

* Use seed_name in string generation

* Fix rng call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants