Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port circuit_to_dag to Rust #13036

Merged
merged 28 commits into from
Sep 10, 2024
Merged

Port circuit_to_dag to Rust #13036

merged 28 commits into from
Sep 10, 2024

Conversation

raynelfss
Copy link
Contributor

@raynelfss raynelfss commented Aug 26, 2024

Is tracked by and resolves #13001 and #13004

Summary

These commits bring circuit_to_dag into rust, leveraging all of the api's introduced by #12550. This also resulted in the addition of a rust-native method. DAGCircuit::from_circuit_data to convert from CircuitData to DAGCircuit from Rust alone.

Details and comments

The following commits aim to add the circuit converter circuit_to_dag to rust while leveraging the rust-side logic for DAGCircuit. This is originally used to turn an instance of QuantumCircuit into DAGCircuit. A rust method to convert CircuitData to DAGCircuit was also added.

Inital changes

  • Add new method DAGCircuit::from_quantum_circuit which uses the data from a QuantumCircuit instance to build a dag_circuit.
  • A new method DAGCircuit::from_circuit_data uses the previously describe method to create a DAGCircuit based only on the CircuitData from a circuit.
    • This method aims to focus on the core structure of the circuit and not any of the other attributes that are only used through python (such as Registers, Variables, Calibrations, etc.)
    • Should be used with caution as CircuitData != QuantumCircuit and results may differ.
  • Expose the method through a Python interface with circuit_to_dag which goes by the alias of core_circuit_to_dag and is called by the original method.
  • Add an arbitrary structure QuantumCircuitData that extract attributes from the python QuantumCircuit instance and makes it easier to access in rust.
    • This structure is for attribute extraction only and is not clonable/copyable.
  • Expose a new module converters which should store all of the rust-side converters whenever they get brought into rust.

Tasks

  • Fix the testing suite.
  • Clean up API
  • Benchmark

Blockers

Questions and suggestions

Feel free to ask any questions and give any feedback below, thank you! 🚀

@raynelfss raynelfss added performance Rust This PR or issue is related to Rust code in the repository mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library on hold Can not fix yet labels Aug 26, 2024
@raynelfss raynelfss force-pushed the rusty-circ-to-dag branch 2 times, most recently from 34ba18d to 923ae3f Compare August 26, 2024 19:01
@raynelfss
Copy link
Contributor Author

There are still some bugs to fix but so far benchmarks look promising for this:

All benchmarks:

| Change   | Before [cc87318f] <main>   | After [24df33df] <rusty-circ-to-dag>   | Ratio   | Benchmark (Parameter)                                        |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| -        | 17.3±0.2μs                 | 7.56±0.09μs                            | 0.44    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8)     |
| -        | 29.3±0.4μs                 | 11.1±0.1μs                             | 0.38    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8)     |
| -        | 11.9±0.1ms                 | 3.67±0.1ms                             | 0.31    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128)  |
| -        | 283±2μs                    | 84.9±0.4μs                             | 0.30    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8)    |
| -        | 191±0.8μs                  | 55.1±0.5μs                             | 0.29    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8)    |
| -        | 62.3±0.2μs                 | 18.0±0.1μs                             | 0.29    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8)     |
| -        | 882±10μs                   | 259±3μs                                | 0.29    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8)    |
| -        | 106±0.8μs                  | 30.4±0.4μs                             | 0.29    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8)     |
| -        | 434±8μs                    | 120±2μs                                | 0.28    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8)    |
| -        | 169±3μs                    | 44.2±0.4μs                             | 0.26    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128)   |
| -        | 34.5±0.3ms                 | 8.74±0.2ms                             | 0.25    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| -        | 6.00±0.03ms                | 1.50±0.03ms                            | 0.25    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128)  |
| -        | 9.80±0.2ms                 | 2.27±0.2ms                             | 0.23    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192)  |
| -        | 2.20±0ms                   | 502±6μs                                | 0.23    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128)  |
| -        | 3.44±0.05ms                | 805±4μs                                | 0.23    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128)  |
| -        | 76.6±0.8ms                 | 17.3±0.3ms                             | 0.23    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192)  |
| -        | 2.44±0.01ms                | 531±4μs                                | 0.22    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048)  |
| -        | 18.5±0.1ms                 | 4.05±0.1ms                             | 0.22    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048)  |
| -        | 351±1μs                    | 72.6±1μs                               | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128)   |
| -        | 47.6±0.9ms                 | 9.96±0.4ms                             | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192)  |
| -        | 1.24±0.01ms                | 260±5μs                                | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128)   |
| -        | 20.6±0.09ms                | 4.10±0.2ms                             | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192)  |
| -        | 755±2μs                    | 147±0.08μs                             | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128)   |
| -        | 5.14±0.06ms                | 880±4μs                                | 0.17    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048)  |
| -        | 11.6±0.09ms                | 2.01±0.05ms                            | 0.17    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048)  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@raynelfss
Copy link
Contributor Author

After fixing all of the failing tests, there was a bit of a performance regression (probably from cloning all the instructions, which I don't think is needed anymore if we do things through rust). But it's still faster than the original.

All benchmarks:

| Change   | Before [cc2edc9f] <main>   | After [7b765db2] <rusty-circ-to-dag>   | Ratio   | Benchmark (Parameter)                                        |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| -        | 17.2±0.1μs                 | 12.0±0.3μs                             | 0.70    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8)     |
| -        | 61.4±0.3μs                 | 37.9±0.5μs                             | 0.62    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8)     |
| -        | 190±1μs                    | 115±0.7μs                              | 0.61    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8)    |
| -        | 29.2±0.3μs                 | 17.9±0.2μs                             | 0.61    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8)     |
| -        | 277±0.3μs                  | 168±1μs                                | 0.61    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8)    |
| -        | 107±4μs                    | 65.0±1μs                               | 0.61    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8)     |
| -        | 425±2μs                    | 243±8μs                                | 0.57    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8)    |
| -        | 20.6±0.1ms                 | 11.3±0.2ms                             | 0.55    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192)  |
| -        | 869±6μs                    | 459±1μs                                | 0.53    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8)    |
| -        | 34.4±2ms                   | 17.9±0.5ms                             | 0.52    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| -        | 48.5±0.8ms                 | 24.7±0.5ms                             | 0.51    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192)  |
| -        | 3.35±0.06ms                | 1.67±0.04ms                            | 0.50    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128)  |
| -        | 11.8±0.04ms                | 5.86±0.1ms                             | 0.50    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128)  |
| -        | 78.7±0.9ms                 | 39.5±0.6ms                             | 0.50    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192)  |
| -        | 363±8μs                    | 178±0.9μs                              | 0.49    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128)   |
| -        | 5.84±0.06ms                | 2.84±0.05ms                            | 0.49    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128)  |
| -        | 2.23±0.1ms                 | 1.06±0.02ms                            | 0.48    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128)  |
| -        | 11.9±0.3ms                 | 5.66±0.2ms                             | 0.48    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048)  |
| -        | 18.7±0.4ms                 | 8.94±0.3ms                             | 0.48    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048)  |
| -        | 10.5±0.2ms                 | 4.90±0.3ms                             | 0.47    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192)  |
| -        | 5.14±0.06ms                | 2.34±0.07ms                            | 0.46    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048)  |
| -        | 755±2μs                    | 349±2μs                                | 0.46    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128)   |
| -        | 1.23±0.02ms                | 569±3μs                                | 0.46    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128)   |
| -        | 168±2μs                    | 76.0±4μs                               | 0.45    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128)   |
| -        | 2.52±0.05ms                | 976±9μs                                | 0.39    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048)  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@raynelfss raynelfss marked this pull request as ready for review August 29, 2024 04:36
@raynelfss raynelfss requested a review from a team as a code owner August 29, 2024 04:36
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@raynelfss raynelfss changed the title [WIP] Port circuit_to_dag to Rust Port circuit_to_dag to Rust Aug 29, 2024
@coveralls
Copy link

coveralls commented Aug 29, 2024

Pull Request Test Coverage Report for Build 10799042871

Details

  • 184 of 213 (86.38%) changed or added relevant lines in 5 files are covered.
  • 39 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+2.3%) to 89.137%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/dag_circuit.rs 125 154 81.17%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.26%
crates/qasm2/src/lex.rs 3 91.73%
crates/circuit/src/dag_node.rs 4 80.24%
qiskit/synthesis/two_qubit/xx_decompose/decomposer.py 6 90.84%
crates/qasm2/src/parse.rs 24 95.77%
Totals Coverage Status
Change from base Build 10796516797: 2.3%
Covered Lines: 73175
Relevant Lines: 82093

💛 - Coveralls

@raynelfss raynelfss added this to the 1.3 beta milestone Sep 3, 2024
@raynelfss raynelfss linked an issue Sep 4, 2024 that may be closed by this pull request
Ok(new_nodes)
}

#[allow(clippy::too_many_arguments)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not just take QuantumCircuitData as the input here? That would let you ignore this clippy and you don't have to pull out each field individually and then each argument could be set by name in the struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it would add some versatility to just accept the arguments, but adding it to the strctu seems to make more sense.

@raynelfss raynelfss removed the on hold Can not fix yet label Sep 6, 2024
@raynelfss
Copy link
Contributor Author

After @mtreinish suggestions, the copying of instructions one by one is more effective.

All benchmarks:

| Change   | Before [3d4bab2b] <main>   | After [7fe0a365] <rusty-circ-to-dag>   | Ratio   | Benchmark (Parameter)                                        |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| -        | 15.3±0.05μs                | 7.85±0.2μs                             | 0.51    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8)     |
| -        | 26.1±0.1μs                 | 11.7±0.03μs                            | 0.45    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8)     |
| -        | 147±2μs                    | 61.3±0.3μs                             | 0.42    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8)    |
| -        | 222±5μs                    | 93.4±0.4μs                             | 0.42    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8)    |
| -        | 81.7±0.8μs                 | 33.2±0.2μs                             | 0.41    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8)     |
| -        | 50.0±0.1μs                 | 20.2±0.07μs                            | 0.40    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8)     |
| -        | 346±1μs                    | 130±0.4μs                              | 0.38    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8)    |
| -        | 703±4μs                    | 271±0.7μs                              | 0.38    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8)    |
| -        | 9.38±0.08ms                | 3.23±0.02ms                            | 0.34    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128)  |
| -        | 4.38±0.02ms                | 1.38±0.01ms                            | 0.32    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128)  |
| -        | 2.46±0.02ms                | 719±3μs                                | 0.29    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128)  |
| -        | 1.57±0.01ms                | 436±4μs                                | 0.28    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128)  |
| -        | 145±4μs                    | 38.7±0.3μs                             | 0.27    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128)   |
| -        | 24.4±0.1ms                 | 6.38±0.2ms                             | 0.26    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| -        | 240±1μs                    | 63.5±0.4μs                             | 0.26    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128)   |
| -        | 867±1μs                    | 226±1μs                                | 0.26    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128)   |
| -        | 531±0.6μs                  | 129±0.3μs                              | 0.24    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128)   |
| -        | 52.1±0.3ms                 | 11.9±0.2ms                             | 0.23    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192)  |
| -        | 2.13±0.04ms                | 446±4μs                                | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048)  |
| -        | 8.46±0.2ms                 | 1.76±0.04ms                            | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192)  |
| -        | 13.1±0.08ms                | 2.78±0.01ms                            | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048)  |
| -        | 3.54±0.06ms                | 718±2μs                                | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048)  |
| -        | 13.8±0.07ms                | 2.71±0.02ms                            | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192)  |
| -        | 7.86±0.06ms                | 1.55±0.01ms                            | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048)  |
| -        | 31.6±0.2ms                 | 6.44±0.03ms                            | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192)  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, I'm very eager to have this as besides being a speedup for conversion it gives us a native method for working with control flow block in the transpiler with minimal python overhead. My main concern is about the extra overhead with building the qubit mapping stuff when we don't have a mapping. I get from a code cleanliness perspective it's a lot easier to have an identity map. But I'm worried about the performance overhead to handle the mapping when there isn't any both in building the maps and then also adding hash lookups on every qubit..

crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
let qc_data = qc.data;
let num_qubits = qc_data.num_qubits();
let num_clbits = qc_data.num_clbits();
let num_ops = qc_data.__len__();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I feel like we should have a native len() method to CircuitData. Not for this PR of course, it's just a bit weird to see this in rust code.

crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
crates/circuit/src/lib.rs Outdated Show resolved Hide resolved
- Introduce a method that adds a chain of `PackedInstruction` continuously avoiding the re-linking of each bit's output-node until the very end of the iterator.
   - TODO: Add handling of vars
- Add header for a `from_iter` function that will create a `DAGCircuit` based on a chain of `PackedInstruction`.
- Fix incorrect re-insertion of last_node.
- Use `entry` api to either modify or insert a value if missing.
- A bug that adds duplicate edges to vars has been temporarily fixed. However, the root of this problem hasn't been found yet. A proper fix is pending. For now skip those instances.
raynelfss and others added 16 commits September 10, 2024 13:47
- Introduce a method that adds a chain of `PackedInstruction` continuously avoiding the re-linking of each bit's output-node until the very end of the iterator.
   - TODO: Add handling of vars
- Add header for a `from_iter` function that will create a `DAGCircuit` based on a chain of `PackedInstruction`.
…circuit` crate.

- Make the `CircuitData` iter method be an exact-size iterator.
- Revert the changes making the interners and registers visible to the crate `qiskit-circuit`.
- Create methods to expose immutable borrowed views of the interners, registers and global_phase to prevent from mutating the DAGCircuit.
- Add `get_qargs` and `get_cargs` to unpack interned qargs ans cargs.
- Other tweaks and fixes.
- Correct incorrect docstrings for `qubits()` and `clbits()`

Co-authored-by: Eli Arbel <46826214+eliarbel@users.noreply.github.com>
- Add new method `DAGCircuit::from_quantum_circuit` which uses the data from a `QuantumCircuit` instance to build a dag_circuit.
- Expose the method through a `Python` interface with `circuit_to_dag` which goes by the alias of `core_circuit_to_dag` and is called by the original method.
- Add an arbitrary structure `QuantumCircuitData` that successfully extract attributes from the python `QuantumCircuit` instance and makes it easier to access in rust.
   - This structure is for attribute extraction only and is not clonable/copyable.
- Expose a new module `converters` which should store all of the rust-side converters whenever they get brought into rust.
- Other small tweaks and fixes.
- Add more accurate estimate of num_edges.
- When converting from `QuantumCircuit` to `DAGCircuit`, we were previously copying the instances of `BitData` by cloning. This would result in instances that had extra qubits that went unused. This commit fixes this oversight and reduced the amount of failing times from 160 to 12.
- Use `copy_operations` to copy all of the operations obtained from `CircuitData` by calling deepcopy.
- Initialize `._data` manually for instances of `BlueprintCircuit` by calling `._build()`.
- Other small fixes.
- Previously, the binding of qubits and clbits into bitdata was done based on the `push_back()` function behavior. This manual mapping was removed in favor of just inserting the qubits/clbits using the `add_qubits()` and `add_clbits()` methods and keeping track of the new indices.
- Remove cloning of interners, use the re-mapped entries instead.
- Remove manual re-mapping of qubits and clbits.
- Remove cloning of BitData, insert qubits directly instead.
- Other small tweaks and fixes.
- Make `QuantumCircuitData` extraction struct private.
- Rename `DAGCircuit::from_quantum_circuit` into `DAGCircuit::from_circuit` to make more generalized.
- Pass all attributes of `QuantumCircuit` that are passed from python as arguments to the `DAGCircuit::from_circuit` function.
- Add `DAGCircuit::from_circuit_data` as a wrapper to `from_circuit` to create an instance solely from the properties available in `CircuitData`.
    - Use `DAGCircuit::from_circuit` as base for `DAGCircuit::from_circuit_data`.
- `QuantumCircuitData` is an extractor helper built only to extract attributes from a Python-based `Quantumircuit` that are not yet available in rust + the qualities that already are through `CircuitData`.
- Add `Clone` trait `QuantumCircuitData` as all its properties are clonable.
- Make `circuit_to_dag` public in rust in case it needs to be used for a transpiler pass.
- Adapt to renaming of `DAGCircuit::add_from_iter` into `DAGCircuit::extend`.
- Use `QuantumCircuitData` as argument for `DAGCircuit::from_circuit`.
- Create instance of `QuantumCircuitData` in `DAGCircuit::from_circuit_data`.
- Re-add duplicate skip in extend.
- Other tweaks and fixes.
- Remove second re-mapping of bits, perform this mapping at insertion of qubits instead.
- Modify insertion of qubits to re-map the indices and store them after insertion should a custom ordering be provided.
- Remove extra `.and_modify()` for `clbits_last_node` from `DAGCircuit::extend`.
- Remove submodule addition to `circuit` module, submodule is now under `_accelerate.converters`.
- Other tweaks and fixes.
@raynelfss
Copy link
Contributor Author

All benchmarks:

| Change   | Before [3d4bab2b] <main>   | After [94f8ab42] <rusty-circ-to-dag>   | Ratio   | Benchmark (Parameter)                                        |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
|          | n/a                        | n/a                                    | n/a     | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| -        | 15.0±0.4μs                 | 7.02±0.02μs                            | 0.47    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8)     |
| -        | 25.6±0.2μs                 | 10.9±0.09μs                            | 0.43    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8)     |
| -        | 217±0.6μs                  | 88.0±0.2μs                             | 0.41    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8)    |
| -        | 82.3±2μs                   | 33.0±2μs                               | 0.40    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8)     |
| -        | 145±0.7μs                  | 56.4±0.4μs                             | 0.39    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8)    |
| -        | 49.5±0.4μs                 | 18.8±0.3μs                             | 0.38    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8)     |
| -        | 342±10μs                   | 123±1μs                                | 0.36    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8)    |
| -        | 711±1μs                    | 259±5μs                                | 0.36    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8)    |
| -        | 9.27±0.07ms                | 2.95±0.04ms                            | 0.32    | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128)  |
| -        | 4.39±0.03ms                | 1.34±0.02ms                            | 0.31    | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128)  |
| -        | 2.43±0.01ms                | 689±3μs                                | 0.28    | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128)  |
| -        | 1.58±0.02ms                | 417±1μs                                | 0.26    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128)  |
| -        | 145±1μs                    | 34.8±0.2μs                             | 0.24    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128)   |
| -        | 24.1±0.1ms                 | 5.68±0.03ms                            | 0.24    | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| -        | 236±2μs                    | 57.1±0.3μs                             | 0.24    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128)   |
| -        | 883±6μs                    | 216±3μs                                | 0.24    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128)   |
| -        | 529±3μs                    | 119±0.5μs                              | 0.22    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128)   |
| -        | 51.6±0.4ms                 | 10.9±0.2ms                             | 0.21    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192)  |
| -        | 8.17±0.05ms                | 1.61±0.03ms                            | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192)  |
| -        | 13.3±0.2ms                 | 2.66±0.04ms                            | 0.20    | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048)  |
| -        | 2.06±0.01ms                | 395±6μs                                | 0.19    | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048)  |
| -        | 31.6±0.3ms                 | 5.98±0.07ms                            | 0.19    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192)  |
| -        | 3.51±0.03ms                | 648±4μs                                | 0.18    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048)  |
| -        | 13.8±0.2ms                 | 2.42±0.02ms                            | 0.18    | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192)  |
| -        | 7.84±0.08ms                | 1.41±0.01ms                            | 0.18    | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048)  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

After @mtreinish 's suggestions there was a slight performance increase.

@raynelfss raynelfss requested a review from mtreinish September 10, 2024 17:53
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, thanks for the fast update. Avoiding the extra overhead of an identity mapping definitely is worth it and the code looks much simpler now that everything is in a single collection. I just had one inline question/suggestion that I think could improve performance a bit more at the cost of some code duplication, but I'm curious what you think.

Comment on lines +6729 to +6777
let instructions: Vec<PackedInstruction> = qc_data
.iter()
.map(|instr| -> PyResult<PackedInstruction> {
// Re-map the qubits
let new_qargs = if let Some(qubit_mapping) = &qubit_map {
let qargs = qc_data
.get_qargs(instr.qubits)
.iter()
.map(|bit| qubit_mapping[bit.0 as usize])
.collect();
new_dag.qargs_interner.insert_owned(qargs)
} else {
new_dag
.qargs_interner
.insert(qc_data.get_qargs(instr.qubits))
};
// Remap the clbits
let new_cargs = if let Some(clbit_mapping) = &clbit_map {
let qargs = qc_data
.get_cargs(instr.clbits)
.iter()
.map(|bit| clbit_mapping[bit.0 as usize])
.collect();
new_dag.cargs_interner.insert_owned(qargs)
} else {
new_dag
.cargs_interner
.insert(qc_data.get_cargs(instr.clbits))
};
// Copy the operations

Ok(PackedInstruction {
op: if copy_op {
instr.op.py_deepcopy(py, None)?
} else {
instr.op.clone()
},
qubits: new_qargs,
clbits: new_cargs,
params: instr.params.clone(),
extra_attrs: instr.extra_attrs.clone(),
#[cfg(feature = "cache_pygates")]
py_op: OnceCell::new(),
})
})
.collect::<PyResult<Vec<_>>>()?;

// Finally add all the instructions back
new_dag.extend(py, instructions)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only reason you're collecting into a vec here because the deepcopy() call is fallible. I was kind of excited to have this all work with iterators in rust so we avoided a big allocation like this. I don't think we want to mask the error in python if a deepcopy fails, that is important for users to get the python exception. While it ends up being a big block of duplicated code, do you think it would make sense to move the copy_op conditional outside the iterator so you do something like:

if copy_op {
   let instructions = iterator.... .collect::<PyResult<Vec<_>>>()
} else {
   new_dag.extend(py, iterator)
}

where iterator is basically this entire chain just with or without the pyresult depending on whether we call clone() or deepcopy.

Copy link
Contributor Author

@raynelfss raynelfss Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other reason we're re-collecting is to re-intern the qargs. But I also believe this could be avoided. After all the only requirement here is that whatever we send to extend() can turn into an iterator of PackedInstruction. I think we can avoid the allocation by not collecting the Vec but we would still have to re-intern things since we're not copying the interners over.

Edit: This earlier suggestion may not work due to the iterator mutably borrowing the dag while the method already holds right to mutate. :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this without re-interning at all in the case when we're not reordering any qubits or clbits. Since the interners should be the same for the dag and circuit.

But, lets save this as a potential followup then. The gains from doing something like this would be be measurable but even with a collection into a Vec this has a good improvement in speed and the logic to figure out how to do this is probably will probably be pretty verbose to handle all the edge cases around whether we are reordering any bits or not, or we need to return a python exception from deepcopy or not. I'll open an issue after this merges so we can track it as a potential future optimization.

mtreinish
mtreinish previously approved these changes Sep 10, 2024
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit/question inline but this LGTM thanks for doing this. If you don't want to update the nit feel free to enqueue this for merge.

crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
@mtreinish mtreinish enabled auto-merge September 10, 2024 19:10
@mtreinish mtreinish added this pull request to the merge queue Sep 10, 2024
Merged via the queue into Qiskit:main with commit 86a1b49 Sep 10, 2024
15 checks passed
@raynelfss raynelfss added the Changelog: None Do not include in changelog label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library performance priority: high Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a DAGCircuit::from_circuit_data to Rust Port circuit_to_dag to Rust
4 participants