Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving memory consumption of DagDependency when lists of transitive predecessors and successors are not required #8525

Merged
merged 11 commits into from
Sep 26, 2022

Conversation

alexanderivrii
Copy link
Member

Summary

This is a follow-up on PR #8184. DAGDependency is a canonical representation of non-commutativity in a circuit. Currently, when a DAGDependency is constructed, for every node we automatically construct the lists of all of its transitive predecessors and successors. This makes the construction slow and especially highly memory-intensive for circuits with many gates. On the other hand, these lists of predecessors/successors are not needed for many applications of DagDependency (see PR #8319).

This PR modifies the code for constructing DAGDependency, so that only direct (immediate) predecessors of a node are considered, and introduces an additional option create_preds_and_succs to converters circuit_to_dagdependency and dag_to_dagdependency to avoid constructing transitive predecessors/successors when the value of this option is False. This improves the runtime and memory consumption of DAGDependency constructions.

Details and comments

Some experimental results (compared with the results in #8184).

Constructing DAGDependency (without predecessors/successors):

Before:

DagDependency: #qubits = 5, #gates = 100, seed = 0, #edges = 151, depth = 30, time = 0.0439
DagDependency: #qubits = 5, #gates = 1000, seed = 0, #edges = 1618, depth = 336, time = 1.0921
DagDependency: #qubits = 5, #gates = 5000, seed = 0, #edges = 8007, depth = 1650, time = 19.8801
DagDependency: #qubits = 10, #gates = 100, seed = 0, #edges = 162, depth = 22, time = 0.0469
DagDependency: #qubits = 10, #gates = 1000, seed = 0, #edges = 1672, depth = 200, time = 1.2437
DagDependency: #qubits = 10, #gates = 5000, seed = 0, #edges = 8333, depth = 943, time = 22.9258

New:

#qubits = 5, #gates =  100: size = 100, #edges = 151, depth = 30, time = 0.0389
#qubits = 5, #gates =  1000: size = 1000, #edges = 1618, depth = 336, time = 0.5190
#qubits = 5, #gates =  5000: size = 5000, #edges = 8007, depth = 1650, time = 10.3534
#qubits = 10, #gates =  100: size = 100, #edges = 162, depth = 22, time = 0.0399
#qubits = 10, #gates =  1000: size = 1000, #edges = 1672, depth = 200, time = 0.5398
#qubits = 10, #gates =  5000: size = 5000, #edges = 8333, depth = 943, time = 10.3706

The DAGDependency statistics (number of nodes, number of edges, depth of the circuit) are of course unchanged, but the times are improved 2x.

Memory consumption (without predecessors/successors)
Before:

#qubits = 5, #gates = 100, memory = 0.3 MB
#qubits = 5, #gates = 1000, memory = 9.4 MB
#qubits = 5, #gates = 5000, memory = 209.3 MB
#qubits = 10, #gates = 100, memory = 0.1 MB
#qubits = 10, #gates = 1000, memory = 7.8 MB
#qubits = 10, #gates = 5000, memory = 199.1 MB

New:

#qubits = 5, #gates = 100, memory = 0.4 MB
#qubits = 5, #gates = 1000, memory = 1.7 MB
#qubits = 5, #gates = 5000, memory = 5.7 MB
#qubits = 10, #gates = 100, memory = 0.0 MB
#qubits = 10, #gates = 1000, memory = 0.0 MB
#qubits = 10, #gates = 5000, memory = 2.8 MB

@alexanderivrii alexanderivrii requested a review from a team as a code owner August 11, 2022 15:14
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@coveralls
Copy link

coveralls commented Aug 14, 2022

Pull Request Test Coverage Report for Build 3130817582

  • 26 of 26 (100.0%) changed or added relevant lines in 3 files are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.004%) to 84.413%

Files with Coverage Reduction New Missed Lines %
qiskit/extensions/quantum_initializer/squ.py 2 79.78%
qiskit/dagcircuit/dagdependency.py 3 86.73%
Totals Coverage Status
Change from base Build 3130815002: -0.004%
Covered Lines: 59603
Relevant Lines: 70609

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a fine incremental improvement (except for one potential bug in line), but my concern which may be larger than this PR is that DAGDependency doesn't seem to leverage any of the graph data structure to analyze structural aspects of the graph. In other words we shouldn't need to store in the node it's list of adjacent nodes because the underlying graph object makes that quick to lookup when needed.

So I guess my higher level question about this is instead of just reducing the set of what is stored in the node why can't we just remove that node.successors and node.predecessors completely because the data is duplicated with what we're building in the graph already.

@@ -454,6 +454,7 @@ def _gather_succ(self, node_id, direct_succ):
the lists of direct successors are put into a single one
"""
gather = self._multi_graph
gather.get_node_data(node_id).successors = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still lost on why this is needed? Storing attributes of graph structural properties like this seem weird to me when we're building this all around a graph data structure. Like if you want the list of successor nodes from a given node in the graph this is just self._multi_graph.successors(node_id): https://qiskit.org/documentation/retworkx/apiref/retworkx.PyDiGraph.successors.html#retworkx.PyDiGraph.successors. There is also a version with a filter function built-in if needed: https://qiskit.org/documentation/retworkx/apiref/retworkx.PyDiGraph.find_successors_by_edge.html#retworkx.PyDiGraph.find_successors_by_edge

If you need to full descendants set you can use rx.descendants() (https://qiskit.org/documentation/retworkx/apiref/retworkx.descendants.html) caching any of these graph relationships at the node level seems a bit unnecessary when we have the graph data structure already and can query it on demand. Unless there is something I'm missing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that DagDependency was originally created to be primarily used in TemplateOptimization pass. For instance, look at the fields of DAGDepNode:

__slots__ = ["type", "_op", "name", "_qargs", "cargs", "sort_key", "node_id", "successors", "predecessors", "reachable", matchedwith", "isblocked", "successorstovisit", "qindices", "cindices"]

many of these only make sense in the context of TemplateOptimization.

For caching vs. not caching lists of direct and transitive predecessors and successors, there is likely some tradeoff between performance and memory consumption, depending on how many times TemplateOptimization happens to call these. I haven't really experimented with TemplateOptimization, so I don't know, but for this PR I don't want to risk making TemplateOptimization even slower than it is now, even though I completely agree that it probably makes perfect sense to use self._multi_graph.successors instead of self.get_node_data(node_id).successors. I do want this PR, as an enabler for block collection strategies that take commutativity into account (#8319), as these are currently slow and extremely memory-hungry to be applied to large circuits.

I do think it would be a good idea to refactor DagDepNode, DagDependency and TemplateOptimization pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mine putting a TODO comment somewhere expressing the desire to remove the this caching in the dag nodes then. I think I was letting my discontent with how this structure is currently built cloud my review here. If we just have an inline comment somewhere saying this refactor is on the table in the future that'll be enough for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the "TODO" comment. Just to be clear, I think that we should definitely refactor DAGDependency, however a lot of it might go hand-in-hand with doing changes to TempateOptimization and I did not want to touch this in the current PR.

Comment on lines 480 to 496
Updates DagDependency by adding edges to the newly added node (max_node)
from the previously added nodes.
For each previously added node (prev_node), an edge from prev_node to max_node
is added if max_node is "reachable" from prev_node (this means that the two
nodes can be made adjacent by commuting them with other nodes), but the two nodes
themselves do not commute.
"""
max_node_id = len(self._multi_graph) - 1
max_node = self._multi_graph.get_node_data(max_node_id)
max_node = self.get_node(max_node_id)

for current_node_id in range(0, max_node_id):
self._multi_graph.get_node_data(current_node_id).reachable = True
# Check the commutation relation with reachable node, it adds edges if it does not commute
reachable = [True] * max_node_id

# Analyze nodes in the reverse topological order.
# An improvement to the original algorithm is to consider only direct predecessors
# and to avoid constructing the lists of forward and backward reachable predecessors
# for every node when not required.
for prev_node_id in range(max_node_id - 1, -1, -1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code won't necessarily work if there are any removals in the graph. The list of node ids in a rustworkx graph might have holes if there are intermediate removals (see the first code example on https://qiskit.org/documentation/retworkx/apiref/retworkx.PyDiGraph.html#retworkx.PyDiGraph or https://qiskit.org/documentation/retworkx/tutorial/introduction.html#removing-elements-from-a-graph) you might end up trying to use an index not in the graph by doing a reverse range like this. It's better to rely on graph.node_indices() to get a sequence of node indices in the graph and iterate over that in the order required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I completely agree, but note that I have not changed how nodes in DagDependency are iterated, the only real change is to consider direct predecessors instead of transitive predecessors, and to avoid storing transitive predecessors if not required (and the same for successors). This whole function _update_edges is very strange and (imho) can only be used when constructing DagDependency from QuantumCircuit or from DagCircuit (which is the case right now), where we don't need to worry about holes in the graph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, do you mind leaving a comment here explaining the context around this because every time I read this code I'll have the same thought. So basically just say inline this context doesn't have removals since it's only called when creating a new dag dependency from another representation of a circuit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this comment to the doc string of _update_edges, and added another TODO to rethink DAGDependency API. At the moment, it's not clear to me whether there are potential usages of DAGDependency that will require to call _update_edges except when DAGDependency is constructed from another representation of a circuit.

@alexanderivrii alexanderivrii changed the title Improved construction of DagDependency Improving memory consumption of DagDependency when lists of transitive predecessors and successors are not required Sep 12, 2022
@alexanderivrii
Copy link
Member Author

@mtreinish, thanks for the review! I presume that

except for one potential bug in line

refers to iterating over nodes without taking removals into account, but (as I commented above) this code is currently only used when DagDependency is originally constructed.

My goal is to continue with collecting blocks of gates taking gate commutativity into account, and this PR is needed to make block collection faster and significantly less memory-hungry for large circuits.

I do think that it would be a good idea to rethink and refactor both DagDepNode and TemplateOptimization code to make better use of rustworkx capabilities.

Is there anything specific that I should do for this PR?

@alexanderivrii alexanderivrii added this to the 0.22 milestone Sep 14, 2022
@mtreinish mtreinish added the Changelog: New Feature Include in the "Added" section of the changelog label Sep 22, 2022
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given that you're saying that my review comments were just temporary conditions and you're leaving the current interface (and potential bugs) as is and concentrating on just memory overhead/performance this PR looks fine to me. I think we should just put some comments inline explaining this and then lets move forward on this.

Comment on lines 480 to 496
Updates DagDependency by adding edges to the newly added node (max_node)
from the previously added nodes.
For each previously added node (prev_node), an edge from prev_node to max_node
is added if max_node is "reachable" from prev_node (this means that the two
nodes can be made adjacent by commuting them with other nodes), but the two nodes
themselves do not commute.
"""
max_node_id = len(self._multi_graph) - 1
max_node = self._multi_graph.get_node_data(max_node_id)
max_node = self.get_node(max_node_id)

for current_node_id in range(0, max_node_id):
self._multi_graph.get_node_data(current_node_id).reachable = True
# Check the commutation relation with reachable node, it adds edges if it does not commute
reachable = [True] * max_node_id

# Analyze nodes in the reverse topological order.
# An improvement to the original algorithm is to consider only direct predecessors
# and to avoid constructing the lists of forward and backward reachable predecessors
# for every node when not required.
for prev_node_id in range(max_node_id - 1, -1, -1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, do you mind leaving a comment here explaining the context around this because every time I read this code I'll have the same thought. So basically just say inline this context doesn't have removals since it's only called when creating a new dag dependency from another representation of a circuit.

@@ -454,6 +454,7 @@ def _gather_succ(self, node_id, direct_succ):
the lists of direct successors are put into a single one
"""
gather = self._multi_graph
gather.get_node_data(node_id).successors = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mine putting a TODO comment somewhere expressing the desire to remove the this caching in the dag nodes then. I think I was letting my discontent with how this structure is currently built cloud my review here. If we just have an inline comment somewhere saying this refactor is on the table in the future that'll be enough for me.

@alexanderivrii
Copy link
Member Author

Hmm, building docs started to fail after merging the main branch. Is this an instance of "things may go wrong on week-ends"?

@mtreinish
Copy link
Member

Hmm, building docs started to fail after merging the main branch. Is this an instance of "things may go wrong on week-ends"?

There was a new sphinx release over the weekend that broke the docs build, they pushed out 5.2.1 yesterday evening to fix it

mtreinish
mtreinish previously approved these changes Sep 26, 2022
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, with a small typo in the release. Thanks for adding the comments I think that makes it clear this is just an incremental step and in a coming future PR we'll have a larger refactor of the class and template optimization

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants