-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dedicated functions for memory marginalization #8051
Conversation
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 2543893476
💛 - Coveralls |
qiskit/result/utils.py
Outdated
to 4 threads. | ||
|
||
Args: | ||
memory: The input memory list, this is a list of hexadecimal strings to be marginalized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not cover IQ data or is that outside the scope of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't factoring that in when I wrote this, but we should try to support that in this function. I can add it to this PR if you have an example for what the input and output look like with IQ data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, here is what the input looks like for a single three-qubit circuit under meas level 1 and with 5 single-shots
memory=[
# qubit 0 qubit 1 qubit 2
[[-12974255.0, -28106672.0], [ 15848939.0, -53271096.0], [-18731048.0, -56490604.0]], #shot 1
[[-18346508.0, -26587824.0], [-12065728.0, -44948360.0], [14035275.0, -65373000.0]], # shot 2
[[ 12802274.0, -20436864.0], [-15967512.0, -37575556.0], [15201290.0, -65182832.0]], # ...
[[ -9187660.0, -22197716.0], [-17028016.0, -49578552.0], [13526576.0, -61017756.0]],
[[ 7006214.0, -32555228.0], [ 16144743.0, -33563124.0], [-23524160.0, -66919196.0]]
]
you can see sth like this by running job_1ts = backend.run(circ, meas_level=1, memory=True, meas_return="single", shots=5)
. If I want to marginalize over some of the qubits then I need to remove their slots. For instance keeping qubits 0 and 2 would result in
memory=[
[[-12974255.0, -28106672.0], [-18731048.0, -56490604.0]], #shot 1
[[-18346508.0, -26587824.0], [14035275.0, -65373000.0]], # shot 2
[[ 12802274.0, -20436864.0], [15201290.0, -65182832.0]], # ...
[[ -9187660.0, -22197716.0], [13526576.0, -61017756.0]],
[[ 7006214.0, -32555228.0], [-23524160.0, -66919196.0]]
]
If we are dealing with average IQ data then the input memory looks like so (again three qubits, one circuit, five shots but now they are averaged).
memory=[[-1059254.375, -26266612.0], [-9012669.0, -41877468.0], [6027076.0, -54875060.0]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added support for this in: e709ce8 I still need to add testing to cover all the paths. But let me know if that interface works for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like how it is implemented without explicit typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That table is what I based e709ce8 but yeah the lack of explicit typing was annoying and why I needed an avg_data
kwarg to differentiate between single level 1 and avg level 0 because I couldn't figure out a way to reliably detect the difference without an explicit input type. I was trying to make this function work independently of the Results
object which is the only place I think that metadata would be stored.
src/results/converters.rs
Outdated
#[inline] | ||
pub fn hex_char_to_bin(c: char) -> &'static str { | ||
match c { | ||
'0' => "0000", | ||
'1' => "0001", | ||
'2' => "0010", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cover cases where we are, e.g., working with the third level of the Transmon? I.e. we have states 0, 1 and 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is outside the scope of this PR (though I like the idea). Because current result object doesn't define basis in metadata, i.e. it always assumes binary, it is difficult to select proper output basis (binary or ternary) only with input hex numbers. Having basis
argument in marginal function seems to me an overkill. But we can implement such ternary memory in experiment data processor, where we always need marginalization of IQ numbers to run custom ternary discriminator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can save this for a follow on, I think having the marginization functions work with this would be useful but there is probably enough to this PR just working with binary to start.
This commit adds dedicated functions for memory marginalization. Previously, the marginal_counts() function had support for marginalizing memory in a Results object, but this can be inefficient especially if your memory list is outside a Results object. The new functions added in this commit are implemented in Rust and multithreaded. Additionally the marginal_counts() function is updated to use the same inner Rust functions.
a2a3004
to
c4924f3
Compare
def marginal_memory( | ||
memory: List[str], | ||
indices: Optional[List[int]] = None, | ||
int_return: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a curiosity. Is the list of integer more efficient in memory footprint than binary ndarray? Given we use memory information to run restless analysis in qiskit experiment, it should take memory efficient representation to run a parallel experiment in 100Q+ device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like storing the shot memory as a 2d array where each row has n elements for each bit or something else? The list of ints here will be more memory efficient than that in the rust side because I'm using a Vec<BigUint>
(whch is just a Vec of digits internally) and it will not be fixed with for each shot. The python side I expect would be similar since the Python integer class is very similar to BigUint (a byte array of digits). (although list isn't necessarily as contiguous as a Vec<T>
/ndarray
). I think it would be best to test this though to be sure and settle on a common way to represent large results values in a non-string type.
As an aside I only used a list here because numpy doesn't have support for arbitrary large integers (outside of using a object
dtype, which ends up just being a pointer to the python heap, for python ints) and I was worried about the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Sounds like current implementation is reasonable (I just worried about storing 2**100 for "10000...0", in binary array it's just 100 binary element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Will approve once comments are addressed.
src/results/converters.rs
Outdated
|
||
#[inline] | ||
pub fn hex_char_to_bin(c: char) -> &'static str { | ||
match c { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but it might be a really fun opportunity to write a constant expression LUT generator function :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leveraged lazy_static to generate a static lookup table in: 5c81510 I need to benchmark it and look into expanding it to support larger chunks. But this might be a good enough start as it should eliminate most of the runtime branching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried playing with adding chunking in groups of 4 and it was slower than doing this. I think this is coming down to needing to do intermediate allocations in how I could get it to work. At this point I think just doing a single element lookup table is probably sufficient. If we end up hitting bottlenecks down the road I feel like we can revisit this easily enough as it's not a big deal to improve the internal implementation down the road.
For benchmarking I compared prior to 5c81510 with 5c81510 and my local chunked implementation using:
import time
import random
from qiskit.result import marginal_memory
random.seed(42)
memory = [hex(random.randint(0, 4096)) for _ in range(500000)]
start = time.perf_counter()
res = marginal_memory(memory, indices=[0, 3, 5, 9])
stop = time.perf_counter()
print(stop - start)
The geometric mean of each implementation over 10 trials each was:
match: 0.08678476060453334
LUT: 0.08359493472968436
Chunked LUT: 0.10288708564573844
Ok(out_mem | ||
.iter() | ||
.map(|x| BigUint::parse_bytes(x.as_bytes(), 2).unwrap()) | ||
.collect::<Vec<BigUint>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay turbo fish! ::<<>>
Co-authored-by: Kevin Hartman <kevin@hart.mn>
In the recently merged Qiskit#8051 we create a lookup table in Rust to speed up the hex->bin conversion used internally as part of the marginal_memory() function. This was previously done using the lazy_static crate which is used to lazily evaluate dynamic code to create a static at runtime on the first access. The typical use case for this is to create a static Vec or HashMap. However for the marginal_counts() usage we didn't need to do this because we were creating a fixed size array so the static can be evaulated at compile time assuming the array is constructed with a const function. This commit removes the lazy_static usage and switches to a true static to further improve the performance of the lookup table by avoiding the construction overhead.
#8223) * Refactor marginal_memory() hex to bin lookup table to be a true static In the recently merged #8051 we create a lookup table in Rust to speed up the hex->bin conversion used internally as part of the marginal_memory() function. This was previously done using the lazy_static crate which is used to lazily evaluate dynamic code to create a static at runtime on the first access. The typical use case for this is to create a static Vec or HashMap. However for the marginal_counts() usage we didn't need to do this because we were creating a fixed size array so the static can be evaulated at compile time assuming the array is constructed with a const function. This commit removes the lazy_static usage and switches to a true static to further improve the performance of the lookup table by avoiding the construction overhead. * Reduce number of empty entries in LUT Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Summary
This commit adds dedicated functions for memory marginalization.
Previously, the marginal_counts() function had support for marginalizing
memory in a Results object, but this can be inefficient especially if
your memory list is outside a Results object. The new functions added in
this commit are implemented in Rust and multithreaded. Additionally the
marginal_counts() function is updated to use the same inner Rust
functions.
Details and comments
TODO: