Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring of the heuristic system #410

Merged
merged 53 commits into from
Jan 16, 2024
Merged

Conversation

EliasLF
Copy link
Collaborator

@EliasLF EliasLF commented Jan 2, 2024

Description

This PR strives to solve multiple problems (that are just too interconnected to keep them to separate PRs):

  • 2 new heuristics GateCountSumDistanceMinusSharedSwaps and GateCountMaxDistanceOrSumDistanceMinusSharedSwaps, where the former is the sum of all qubit pair distances minus an upper bound for how many swaps could potentially be saved by sharing with other moving qubits, and the latter is just the dominating heuristic over the previous GateCountMaxDistance and the new GateCountSumDistanceMinusSharedSwaps.
    Tested on a subset of MQTbench and mapping to IBM Brisbane GateCountMaxDistanceOrSumDistanceMinusSharedSwaps reduces the effective branching rate by about 10% compared to GateCountMaxDistance, while of course yielding the same results (if lookahead is disabled) as both heuristics are principally admissible.
    Unfortunately, the current default lookahead heuristic GateCountMaxDistance seems to not work well with this new heuristic, as the combination results in slightly higher costs. This is probably due to scaling issues (since the new heuristic is closer to the real cost and therefore larger, reducing the impact of the lookahead penalty) but might be solvable with higher lookahead factors or a new better suited lookahead heuristic.
  • Introducing a more flexible system for heuristics (both for the main heuristic and the lookahead heuristic). In the new system any implementation specifics of a heuristic are isolated to a single function calculating a heuristic value from a search node, outside of those functions only a few characteristics of the heuristics are relevant for the mapper:
    • principal admissibility (i.e. admissibility at least on the optimal solution path)
    • tightness (heuristics that are 0 in all goal nodes)
    • fidelity-awareness
  • Fixing the handling of CNOT reversals in Dijkstra, fixed-cost-calculation and all the pre-existing heuristics. The current implementation resulted in both non-admissible heuristics and fixed costs that did not accurately reflect the gates added by QMAP (the cost calculation currently assumes that at most 1 reversal is added, while QMAP actually inserts reversals for each backwards CNOT). This new implementation is as robust as possible, only failing to produce admissible heuristics in edge-cases where cumulative reversal costs on one edge surpass SWAP costs resulting in a non-convex cost space (which generally does not allow for heuristics, that are both admissible and tight)
  • Moving all methods from HeuristicMapper::Node to HeuristicMapper (since they grew more and more dependent on data structures from HeuristicMapper with recent PRs) and making them more atomic (i.e. reducing points at which nodes are in an inconsistent state)
  • Added support for semi-directional architectures (i.e. architectures with both bidirectional and unidirectional edges)

List of minor changes and bug fixes:

  • Cleaning up HeuristicMapper::Node:
    • Removing nswaps because of redundancy with swaps.size()
    • Making swaps a one-dimensional vector (I assume the inner dimension was originally intended for the possibility of adding multiple swaps per node. However, this is currently not implemented, so all inner vectors are just of length 1)
    • renaming done to validMapping, which better describes the property now that non-tight heuristics have been introduced to QMAP
  • Fixed tracking of HeuristicMapper::Node::validMappedTwoQubitGates and activated it also for non-fidelity-aware heuristics enabling a more efficient check if a node has a valid mapping.
  • Reduced the register sizes in the example circuits to the actual number of qubits used in each circuit, to allow for easier checks in the tests for the minimum required architecture size
  • Made the ordering of validly mapped search nodes consistent (as defined by operator>), where previously all validly mapped nodes with the same total cost were considered equal (resulting in an arbitrary ordering in the priority queue and thereby potentially different optimal solutions for different principally admissible heuristics)
  • fixing a bug in the handling of teleportation qubits in HeuristicMapper::createInitialMapping, that causes this function to go into an endless loop in some edge cases (instead of randomly iterating over the whole coupling map, it only iterates over a part of it due to the upper limit for the RNG being too low, which causes an endless loop once that subgraph is fully mapped but free teleportation qubits remain)
  • removing redundant Mapper::fidelities
  • simplifying Dijkstra::buildEdgeSkipTable by internally calling Dijkstra::buildTable for the 0th dimension
  • moving checks for invalid settings from HeuristicMapper::map to new method HeuristicMapper::checkParameters

Checklist:

  • The pull request only contains commits that are related to it.
  • I have added appropriate tests and documentation.
  • I have made sure that all CI jobs on GitHub pass.
  • The pull request introduces no new warnings and follows the project's style guidelines.

@EliasLF EliasLF added feature New feature or request c++ Anything related to C++ code fix Anything related to bugfixes code quality Anything related to code quality and code style. labels Jan 2, 2024
test/test_heuristic.cpp Fixed Show fixed Hide fixed
test/test_heuristic.cpp Fixed Show fixed Hide fixed
test/test_heuristic.cpp Fixed Show fixed Hide fixed
@EliasLF EliasLF marked this pull request as ready for review January 12, 2024 06:26
Copy link
Member

@burgholzer burgholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for another great contribution. What a nice way to start the year!
I really like the direction this PR is taking. Makes the code much more organized.
Most of the comments below are really just comments. Feel free to read them, think about them, and discard them if you don't agree with them.
I think the only point where I might not be that happy is the Architecture class as some redundancy seems to be introduced by the PR. We should be able to clarify this quickly though!

include/Architecture.hpp Outdated Show resolved Hide resolved
include/Architecture.hpp Show resolved Hide resolved
include/Architecture.hpp Show resolved Hide resolved
include/configuration/Heuristic.hpp Show resolved Hide resolved
include/configuration/Heuristic.hpp Show resolved Hide resolved
src/Architecture.cpp Show resolved Hide resolved
src/configuration/Configuration.cpp Outdated Show resolved Hide resolved
src/heuristic/HeuristicMapper.cpp Outdated Show resolved Hide resolved
src/heuristic/HeuristicMapper.cpp Show resolved Hide resolved
test/test_heuristic.cpp Outdated Show resolved Hide resolved
@EliasLF
Copy link
Collaborator Author

EliasLF commented Jan 13, 2024

Alright, first of all thank you very much for the quick review!

Some of the design choices in this PR were made with future plans/issues in mind, which are (or were previously) not mentioned in the PR description above. Before going into detail in your code comments, maybe let's first discuss them in general here:

  • semi-directional architectures (i.e. architectures with both bi- and unidirectional edges)
  • user-defined gate costs (for the non-fidelity-aware case)
  • arbitrary 2q gates

If I'm not mistaken, we already talked about all of these in person but it's probably a good idea to also get this into writing here on Github:

The order above is (most likely) also the order of their relevancy (from low to high) and unfortunately also the order of the incurred cost/complexity (from low to high).

To my knowledge currently there exist no semi-directional architectures in practice. However, our heuristic mapper is implemented abstractly enough, that allowing for such architectures comes almost for free (2 extra jump instructions per search node (for purely bi-/unidirectional architectures), a boolean field Architecture::isUnidirectional, a getter method for that field and 1 small else-case in the loop of Architecture::createDistanceTable).
Originally, the plan was to tackle this issue in a future PR, but thanks to your comments I just realized that there were only 2 lines remaining (in the heuristic mapper) that assumed pure directionality. I therefore just pushed these remaining changes (+ a few optimizations) and added this "feature" to the PR description.
If you disagree with the decision to allow semi-directional architectures, it would be quite easy to revert the changes. In that case, however, I think we should at least specifically check and disallow loading such coupling maps (which currently is not the case, they are just not correctly handled during mapping)

User-defined gate costs and arbitrary 2q gates are probably much more relevant, but also more complex, which I only realized recently. As mentioned in the PR description, even the current implementation strictly speaking wrongly assumes a convex cost space due to cumulative reversal costs on logical edges possibly surpassing swap costs (in the non-fidelity-aware case on non-bidirectional architectures). E.g. if there are 9 congruent 2q-gates in a layer, which are already validly mapped to a back-edge incurring 9*4=36 in reversal costs, swapping them to a neighboring forward-edge with 1 swap only costs 34, i.e. the first goal node on the search path is not the optimal goal node.
This is currently not a huge problem, since it's rare to find 9 congruent CNOTs in 1 layer (and even only possible with Disjoint2qBlock layering) and could easily be optimized down to at most 3 CNOTs in the pre-optimization stage. However, with more evened out costs between H and CNOT, or different reversal operations for other 2q gates, this non-convexity might become more significant.
The new distance/dijkstra system (not only fixes the currently flawed, non-admissible system, but also) gives flexibility for future heuristics tackling this non-convexity problem by dropping the tightness constraint. In contrast to the previous custom dijkstra solution, edge-skipping dijkstra will return the correct distance for any combination of swap cost and reversal cost; and 1 execution with 1 skip (as it is currently used in the non-fidelity-aware case) it only increases the runtime complexity from O(E*log V) to O(E*log V + V²*E), which is not too bad for something that runs only once per mapping.

@burgholzer
Copy link
Member

Alright, first of all thank you very much for the quick review!

Some of the design choices in this PR were made with future plans/issues in mind, which are (or were previously) not mentioned in the PR description above. Before going into detail in your code comments, maybe let's first discuss them in general here:

Thanks for the detailed answer. This cleared up quite a lot. I'll also start here before going in-depth with the comments.

  • semi-directional architectures (i.e. architectures with both bi- and unidirectional edges)
  • user-defined gate costs (for the non-fidelity-aware case)
  • arbitrary 2q gates

If I'm not mistaken, we already talked about all of these in person but it's probably a good idea to also get this into writing here on Github:

The order above is (most likely) also the order of their relevancy (from low to high) and unfortunately also the order of the incurred cost/complexity (from low to high).

To my knowledge currently there exist no semi-directional architectures in practice. However, our heuristic mapper is implemented abstractly enough, that allowing for such architectures comes almost for free (2 extra jump instructions per search node (for purely bi-/unidirectional architectures), a boolean field Architecture::isUnidirectional, a getter method for that field and 1 small else-case in the loop of Architecture::createDistanceTable). Originally, the plan was to tackle this issue in a future PR, but thanks to your comments I just realized that there were only 2 lines remaining (in the heuristic mapper) that assumed pure directionality. I therefore just pushed these remaining changes (+ a few optimizations) and added this "feature" to the PR description. If you disagree with the decision to allow semi-directional architectures, it would be quite easy to revert the changes. In that case, however, I think we should at least specifically check and disallow loading such coupling maps (which currently is not the case, they are just not correctly handled during mapping)

Now that makes a lot more sense! Although such architecture do not exist in practice at the moment, it could very much happen and it almost never hurts to be a little more general. So I am happy with the changes here! Even better to hear, that the review helped in identifying the remaining places that needed changes.

User-defined gate costs and arbitrary 2q gates are probably much more relevant, but also more complex, which I only realized recently.

I have feared as much. Although I still hope that the number of cases that need to be added to handle arbitrary two-qubit gates stays reasonably low. In fact, for bidirectional architectures there shouldn't be too many changes at all. There might be some more optimization potential (e.g., commutation rules involving controlled gates), but we are not taking advantage of that at the moment anyway. The unidirectional case might be trickier, but I think the existing PR already laid a solid foundation for that.

As mentioned in the PR description, even the current implementation strictly speaking wrongly assumes a convex cost space due to cumulative reversal costs on logical edges possibly surpassing swap costs (in the non-fidelity-aware case on non-bidirectional architectures). E.g. if there are 9 congruent 2q-gates in a layer, which are already validly mapped to a back-edge incurring 9_4=36 in reversal costs, swapping them to a neighboring forward-edge with 1 swap only costs 34, i.e. the first goal node on the search path is not the optimal goal node. This is currently not a huge problem, since it's rare to find 9 congruent CNOTs in 1 layer (and even only possible with Disjoint2qBlock layering) and could easily be optimized down to at most 3 CNOTs in the pre-optimization stage. However, with more evened out costs between H and CNOT, or different reversal operations for other 2q gates, this non-convexity might become more significant. The new distance/dijkstra system (not only fixes the currently flawed, non-admissible system, but also) gives flexibility for future heuristics tackling this non-convexity problem by dropping the tightness constraint. In contrast to the previous custom dijkstra solution, edge-skipping dijkstra will return the correct distance for any combination of swap cost and reversal cost; and 1 execution with 1 skip (as it is currently used in the non-fidelity-aware case) it only increases the runtime complexity from O(E_log V) to O(E_log V + V²_E), which is not too bad for something that runs only once per mapping.

I agree that this mostly seems like a theoretical issue for now. Especially since I would guess that the cost difference between single- and two-qubit gates will always stay rather big.
I also agree that the runtime trade-off is definitely worth it. Technically it is not once per mapping, but once per architecture. That information could be pre-computed and re-used, right? At some point we might need a system for such pre-computations (Similar how we did that for the sub-architectures feature; that is unfortunately not as neatly integrated as I would love it to be)

@EliasLF
Copy link
Collaborator Author

EliasLF commented Jan 15, 2024

Technically it is not once per mapping, but once per architecture. That information could be pre-computed and re-used, right? At some point we might need a system for such pre-computations

I agree, pre-computing all the distance values for common architectures (and maybe even a system for saving them to a file for custom architectures) is a great idea and would open the possibility of using metrics that are otherwise too expensive to compute for each mapping process.
For example by computing all possible distances for any reversal cost (which are finitely many, since the cheapest path can only change so many times before skipping the most expensive edge) we could solve the reversal cost problem robustly for the general case.

Too bad, the same is not possible/useful for fidelity distances because of their variability over time. Pre-computing viable swap sharing combinations would solve so much of the gap between the heuristic and true cost there.

@burgholzer
Copy link
Member

Technically it is not once per mapping, but once per architecture. That information could be pre-computed and re-used, right? At some point we might need a system for such pre-computations

I agree, pre-computing all the distance values for common architectures (and maybe even a system for saving them to a file for custom architectures) is a great idea and would open the possibility of using metrics that are otherwise too expensive to compute for each mapping process.

For example by computing all possible distances for any reversal cost (which are finitely many, since the cheapest path can only change so many times before skipping the most expensive edge) we could solve the reversal cost problem robustly for the general case.

Too bad, the same is not possible/useful for fidelity distances because of their variability over time. Pre-computing viable swap sharing combinations would solve so much of the gap between the heuristic and true cost there.

Although fidelity data varies over time, it all depends on the frequency of calibrations whether it is worth to invest the time and pre-compute these values.
If calibration only happens once every couple hours and computing the tables takes a couple minutes (assuming HPC resources available), it could be worth it.

@EliasLF
Copy link
Collaborator Author

EliasLF commented Jan 15, 2024

Although fidelity data varies over time, it all depends on the frequency of calibrations whether it is worth to invest the time and pre-compute these values.
If calibration only happens once every couple hours and computing the tables takes a couple minutes (assuming HPC resources available), it could be worth it.

Hm, interesting point. I guess if you would map multiple circuits per calibration cycle, that's true, yes. I will keep it as an option in mind then👍

@EliasLF EliasLF merged commit 63eaf08 into cda-tum:main Jan 16, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Anything related to C++ code code quality Anything related to code quality and code style. feature New feature or request fix Anything related to bugfixes
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants