-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable avoiding Python operation creation in transpiler #12692
Enable avoiding Python operation creation in transpiler #12692
Conversation
Since Qiskit#12459 accessing `node.op` in the transpiler eagerly creates a Python object on access. This is because we now are no longer storing a Python object internally and we need to rebuild the object to return the python object as expected by the api. This is causing a significant performance regression because of the extra overhead. The longer term goal is to move as much of the performance critical passes to operate in rust which will eliminate this overhead. But in the meantime we can mitigate the performance overhead by changing the Python access patterns to avoid the operation object creation. This commit adds some new getter methods to DAGOpNode to give access to the inner rust data so that we can avoid the extra overhead. As a proof of concept this updates the unitary synthesis pass in isolation. Doing this fixes the regression caused by Qiskit#12459 for that pass. We can continue this migration for everything else in follow up PRs. This commit is mostly to establish the pattern and add the python space access methods.
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 9718970161Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9719830598Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9725168342Details
💛 - Coveralls |
This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of Qiskit#12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of Qiskit#12459. As part of this the commutation checker is rewritten in rust since all that requires is gates in rust which we've had a representation of since Qiskit#12459 merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it's nice to have an intermediate solution until the passes are migrated to rust, but I wonder how do we want to approach the follow-ups. Do we want to try migrating as many python passes to not access node.op
first, independently of when we plan to have a rust implementation for them? Or does it make sense to wait and see which passes we don't manage to migrate to rust in the near term and only update those?
py, op, None, None, params, label, duration, unit, condition, | ||
)?; | ||
instruction.qubits = qargs.into(); | ||
instruction.clbits = cargs.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something here, but why aren't the qubits and clbits directly given to the CircuitInstruction
constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because of a type mismatch. The constructor for a circuit instruction takes in an impl IntoIterator
while we have a PyTuple
here which isn't doesn't implement the IntoItertator
trait. I thought about collecting the tuples into a Vec
so we could pass them to new()
, but at the end of the day the CircuitInstruction
has a PyTuple
in it, so I just elected to do this to avoid needing any conversions or tweaks to the interfaces.
Yeah I was thinking of trying to do this first. At this point I'm a bit skeptical we'll get any passes ported before the 1.2 PR proposal freeze deadline in ~2 weeks given the current state of the dagcircuit PR. So I pushed this up to give us a way to mitigate the regression caused by #12459 in the short term for 1.2. But there are probably some passes that we can work with just |
This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of Qiskit#12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of Qiskit#12459.
This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.
This commit updates the BasisTranslator transpiler pass. It builds off of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of Qiskit#12459 on the BasisTranslator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Approving!
This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of Qiskit#12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of Qiskit#12459.
This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.
* Avoid Python op creation in commutative cancellation This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of #12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of #12459. * Remove stray print * Don't add __array__ to DAGOpNode or CircuitInstruction
This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.
This commit updates the BasisTranslator transpiler pass. It builds off of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of Qiskit#12459 on the BasisTranslator.
This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.
This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.
* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in #12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after #12459. To that end this builds on the thread of work in the two PRs #12692 and #12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>
This commit updates the BasisTranslator transpiler pass. It builds off of #12692 and #12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of #12459 on the BasisTranslator.
* Avoid Python operation creation in transpiler Since Qiskit#12459 accessing `node.op` in the transpiler eagerly creates a Python object on access. This is because we now are no longer storing a Python object internally and we need to rebuild the object to return the python object as expected by the api. This is causing a significant performance regression because of the extra overhead. The longer term goal is to move as much of the performance critical passes to operate in rust which will eliminate this overhead. But in the meantime we can mitigate the performance overhead by changing the Python access patterns to avoid the operation object creation. This commit adds some new getter methods to DAGOpNode to give access to the inner rust data so that we can avoid the extra overhead. As a proof of concept this updates the unitary synthesis pass in isolation. Doing this fixes the regression caused by Qiskit#12459 for that pass. We can continue this migration for everything else in follow up PRs. This commit is mostly to establish the pattern and add the python space access methods. * Remove unused import * Add path to avoid StandardGate conversion in circuit_to_dag * Add fast path through dag_to_circuit
* Avoid Python op creation in commutative cancellation This commit updates the commutative cancellation and commutation analysis transpiler pass. It builds off of Qiskit#12692 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression on these passes introduced by the extra conversion cost of Qiskit#12459. * Remove stray print * Don't add __array__ to DAGOpNode or CircuitInstruction
* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com>
This commit updates the BasisTranslator transpiler pass. It builds off of Qiskit#12692 and Qiskit#12701 to adjust access patterns in the python transpiler path to avoid eagerly creating a Python space operation object. The goal of this PR is to mitigate the performance regression introduced by the extra conversion cost of Qiskit#12459 on the BasisTranslator.
Summary
Since #12459 accessing
node.op
in the transpiler eagerly creates a Python object on access. This is because we now are no longer storing a Python object internally and we need to rebuild the object to return the python object as expected by the api. This is causing a significant performance regression because of the extra overhead. The longer term goal is to move as much of the performance critical passes to operate in rust which will eliminate this overhead. But in the meantime we can mitigate the performance overhead by changing the Python access patterns to avoid the operation object creation. This commit adds some new getter methods to DAGOpNode to give access to the inner rust data so that we can avoid the extra overhead. As a proof of concept this updates the unitary synthesis pass in isolation. Doing this fixes the regression caused by #12459 for that pass. We can continue this migration for everything else in follow up PRs. This commit is mostly to establish the pattern and add the python space access methods.Details and comments