-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approximation degree should be zero by default #8043
Comments
To see why this is a mess, consider the following graph that shows heavy outputs from a single QV32 circuit that has been transpiled at O3 100 times: One sees that the best results (just worry about blue dots here) are at the largest number of CX gates. This is quite counter intuitive, and completely wrong. What is happening is that the O3 preset passmanager is randomly selecting qubits based on Sabre that give a good mapping. However because of variations across the system, the default on approximation is approximating the circuit to various degrees. Thus the "best" circuit in terms of CX gates has actually been approximated severely because (at least it this is my guess) it landed on less than ideal qubits. One can see with the orange does that my moving the circuits around one can do better, but not really any better than the 40cx circuits. This is because HOP is measured with respect to the original circuit distribution to which the heavily approximated ones are no longer strictly faithful. Beyond this example, there is general uncertainty as to exactly which circuits the approximation level is affecting the outcomes. And that is the real problem with |
Just to let you know what's happening here: while I personally would vote for the same behaviour of I think we need an answer on our desired design direction for the transpiler from its technical/research leads, so I've assigned Lev and Kevin as proxies for them, and I'll add it to the agenda in Terra meetings. I can't promise it'll be top of the priority queue, especially since there's a workaround, but I'll add it for discussion. The question of whether our current approximation strategies actually produce better output in all cases is related, but slightly different. We can merge changes to improve those whether or not they're the default. I also do think it's expected behaviour that changing the |
can you plot this for I tend to generally agree with the issue, but not for quite the same reason you state. To me, when you give a backend, you are not just giving a coupling map -- you are giving backend properties too which means the compiler should take it into account. If you just want a mapping, then you should do But my reason for turning approximation off by default is that we currently do not have a good way of making decisions on what approximation degree is best (this is open research). Our decisions are local, which may not play well in global circuits. If we have better understanding of making approximations, then I'm fine turning this back on. But I would like to add my own feature request: if the user sets approximation to something non-zero, and gate fidelities are known, the approximation should be some function of those gate fidelities (i.e. approximate more on bad gates, less on good gates). |
I understand this viewpoint. However, looking at things from the point of an user who is not well versed with the code base I am not sure they would appreciate this. For example, there is no tutorial that explains these nuances and instruct the user as to how to proceed in a manner like you suggest above (which is the proper way to do things here). Even within the source code it is hard to tell. One would have to know to go to the and there it defaults to the highest level of approximation
but the default value is
I don't think many users actually would be able to identify this as being an issue. The cases that have been reported are so wrong that they get flagged. But in many cases the issues with this setting are likely to fly under the radar given the vast number of other things that can fluctuate when dealing with HW. In my case, the only way I realized this is because I had seeded the QV circuit and transpiler but noted that the CX gate count in my plot had changed when I moved to a different system in an attempt to understand why more CX gates gave better HOP. |
Thanks for raising this @nonhermitian . We discussed this at yesterday's Terra meeting, and I spent a bit of time thinking about the issues here, in #7961 and #7341 . I think there are a couple of factors and more than one bug at play which complicate the issue, but you are spot on that this is an area where it's easy for the transpiler to currently do something other than what the user expects. I've been writing up some thoughts, but wanted to make sure I understand your QV example first.
I'm not sure I 100% follow the explanation here. This is for a single QV 32 circuit, with 100 different initial layouts and routings based on the randomness in sabre, and it sounds like you're saying that circuits with higher final CX counts are going to amount to more 2q blocks and thus more opportunities for approximation, leading to erroneously high HOPs. I would naively expect here though, that the outputs with fewer output CX gates would be those which have been heavily approximated (assuming the same seeding as the second example, approximation would be the only way to end up with fewer than 39, 42, or 45 CX gates) and those don't necessarily have higher HOPs. One interesting (but maybe difficult to answer) question is whether or not, for a fixed layout and routing, approximation leads to a higher or lower HOP in general. Maybe something like, for a fixed set of transpiler seeds, plot the HOP obtained with the default approximation ( |
@kdk thanks for taking a look! The first plot is indeed the exact same initial circuit with the same heavy outputs from which to assign a HOP value. The spread of CX values is indeed from Sabre, but also from the default approximation being turned on. The second plot shows the range of CX with no approximation, and there each "bar" differs by 3 CX; a SWAP. The issue is not with the high HOP values at high CX count per say; the second graph shows that that is inline with what one would expect. Rather it is that lower CX counts do not equate to a higher HOP. All else being equal, because each gate has error the lower the CX count the better your QV circuits should be at getting higher HOP values. Indeed, this is why a lot of effort is spent to minimize the CX count via integer programming. However, the first plot shows that systematically the lower the number of CX gates the worse the HOP is; the complete opposite of what one would expect. This is bad because what I would naively do is transpile multiple times to get the lowest CX count (thinking it all is due to swaps and mapping). In this case it is a bad idea.
This is correct; lower CX is higher approximation. The fact that they do not lead to higher HOP than circuits with up to 8 more CX gates is counterintuitive, and really goes against the whole point of an optimization; my optimization should not make things markedly worse (and here it does so by default!) The high level of CX truncation is likely because it randomly landed on high-error qubits and threw away CX gates per block when it thought it could. Executing on those same qubits could of course lead to bad outcomes. As such you can move the circuit around and bring the HOP up to the same level as the larger CX count circuits, although this necessarily violates the approximation that was done. The fact that you can't bring it higher than the larger CX count circuits is a hint that one is limited by the truncation which targets a different distribution than the original full unitary, and thus my HOP values will take a hit as I am aiming for the wrong target distribution. My personal take on this whole thing is that, by default, the transpiler should preserve the unitary up to known linear transformations that I can keep track of (permutations, similarity transformations, swap reordering, etc). We should have that as the bedrock principle of what the transpiler does, and anything beyond that should be explicit to activate. I do get @ajavadia point though that if an user passes a backend we should try to do our best (even if that is not what might be going on here). I would say though that doing so should be explicit, as stated above, and is really a matter of getting users to understand the workflow and options. We in fact already do this. For example, O3 gives you the best mappings by far, yet it is not default. The end user must recognize that O3 is what you should use, and that you might have to run it several times to get a good mapping. With respect to the approximations used here, this is very similar to options like |
Since #8595 has merged I think this has been implemented so I'm going to close this. If I'm missing something here though or misinterpreting what was needed for this issue please feel free to reopen this. |
What should we add?
There have been several instances where having the transpiler
approximation_degree
nonzero by default has caused trouble. eg see #7961 and #7341. This has once again become a problem as it took me a long time to understand why my circuits were not the same across several backends even though all the seeds in the transpiler were set. As a consequence I was getting odd results and gate counts. As this is at least the third time that this issue has popped up, and no user has said this behavior is desired by default, it should be turned off unless explicitly set otherwise.The text was updated successfully, but these errors were encountered: