-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachinePipeliner] Add an abstract layer to manipulate Data Dependenc… #109918
[MachinePipeliner] Add an abstract layer to manipulate Data Dependenc… #109918
Conversation
e02f4ab
to
2424835
Compare
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this is a mostly mechanical fix which looks fine, except that it ignores the fact that antidependences in memory and explicit physical registers also create loop-carried dependences without having any phi nodes. The existing algorithm sometimes takes these into account, and sometimes not, which makes it truly confusing. As only the phi-node anti-dependences are turned into data nodes with distance 1 in this path, at least some of the places where the code is marked "To preserve previous behavior and prevent regression" are likely necessary and can't be removed "if this doesn't have significant impact on performance".
I'm particularly concerned about the fact that these antidependences are not reversed before calling createAdjacencyStructure, so the circuits they imply won't be seen at all, and that's going to do ... something, so I think we need to leave in the reversal for now. It might be possible to merge this behavior into createAdjacencyStructure.
I also note that I would expect some performance changes as a result of this change simply because nodes do get added to queues in a different order; previously most queues would process things in "within iteration, then loop-dependent" order.
The existing implementation doesn't express loop-carried anti-dependencies in memory and physical registers directly by anti-edges. For memory, such dependencies are detected by calling
Now
Yes, the iteration order of edges does change. However, I believe that for most functions, their behavior is independent of the order of the edges. I forgot to mention in the description, but I used llvm-test-suite to verify that there is no change in the compiled result before and after this change. Of course, I don't try every combination of compilation options, but I believe that this patch doesn't change the existing behavior. However, as you said, the existing algorithm is confusing, so there is a great possibility of missing or misunderstanding. |
Hi, @dpenry Thank you for your previous comment. On second thought, I am beginning to think that the code marked "To preserve previous behavior and prevent regression" should be left in place. My guess remains the same as before. That is, I think the handling of these anti-dependencies looks unintentional to me. If loop-carried physical register dependencies are to be handled correctly, at least the following must also be considered (but are not now).
However, it is possible that the current implementation just happens to handle loop-carried dependencies correctly and works fine. So, now I agree with you that we should not remove code marked as "To preserve previous behavior and prevent regression". But these are just workarounds and should be fixed to treat other loop-carried dependencies correctly in the future. What do you think? |
Physical register dependency support is not unintentional, just not widely used and somewhat under-tested because software pipelining is enabled for only a handful of architectures and the first ones for which it was used supported decrement-and-branch instructions. When I added support for SMS for ARM and had to support using normal conditional branches using the condition codes, I had to insert some special cases for physical registers because some dependences were not being respected. See the comments in dcb7764. I'm not sure what the godbolt link you've supplied is supposed to show, as there isn't a loop, so you can't say anything about whether loop-carried dependences are handled correctly. The proper output and anti-dependencies for the MIR are present in the data graph. One might argue that an anti-dependence between SU(1) and SU(4) is missing, but the anti-dependence from 1->2 combined with the output dependence from 2->4 make it unnecessary, and I think the DDG builder might notice that. |
Sorry, something was wrong with the my previous godbolt link. The following is the correct one. https://godbolt.org/z/3v5rnPG3o I think that the loop-carried dependence SU(5) -> SU(2) is not handled correctly in the current implementation. If we're just talking about this case, it doesn't matter because instructions that read/write |
That godbolt makes more sense! I do note that in this particular instance, an SU(5)->SU(2) dependence carried across the loop is not seen because SU(7) kills the liveness of SU(5)'s CPSR write in the output direction and SU(1) kills the liveness of whatever entered the basic block in the input direction. Is that a problem in general? I'd have to think more on it. Note that CPSR isn't called out specially in the code; all physical registers are treated alike. Are all instructions annotated properly with implicit physical register def/use/kill information? That's up to the instruction set definitions to get right. SVE's FFR is an interesting beast because it's a cumulative fault register, so there isn't really a sensible notion of an output dependence, kind of like FPSR flags. Getting the order of writes to it wrong generally doesn't matter until it's read. However, RDFFR/RDFFRS/SETFFR/WRFFR probably should be marked as unpipelineable instructions so that we don't try to pipeline some odd code that actually does care about the order. I do think we've gotten a bit far afield here. We don't (yet) have an example of a failure case due to loop-carried physical registers or memory dependencies with the old code, and we're leaving in most of the bits of logic that purport to handle it. The only question is whether the removal of the reversal code around createAdjacencyStructure is going to create a new failure case in the presence of loop-carried physical register or memory dependencies. That removal results in fewer computed circuits. Circuits are used for MII calculation and for checkValidNodeOrder. MII calculation is not a correctness issue. As for validity, being in a circuit makes it legal to be after both successors and predecessors, so fewer circuits means fewer legal orders, so I don't think we can introduce new failure cases, just reduced opportunities for pipelining. So, I think we're good for now. I would like to see what others have to say. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, after all that discussion, I'm good with this patch but I'd like to see what other people have to say about the new abstraction itself.
Thanks for your detailed comments!
Unless we can determine that it's not going to be a problem, I think we should be conservative with dependency analysis. However, as you said, it's also true that I don't have an example that triggers invalid code generation. It also complicates things that missing dependencies do not necessarily cause invalid code to be generated. In any case, the FIXME comments I added in this patch need to be changed.
I believe that the circuits to be detected do not change with or without this patch, because
Also, |
// To preserve previous behavior and prevent regression | ||
// FIXME: Remove if this doesn't have significant impact on | ||
for (const auto &OE : DDG->getOutEdges(SU)) | ||
if (OE.getDistance() == 1) | ||
Worklist.push_back(OE.getDst()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpenry
Let me make sure. In terms of correctness, we can remove the for statement here, right? I added this for statement to preserve the behavior of the original implementation. This appends all predecessors of the SU, which may contain anti-dependencies from a PHI. There is a similar part in normalizeNonPipelinedInstructions
, which I believe we can remove as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the original code is adding the loop-carried antidependences for just the PHI nodes and I think those edges are actually in the new DDG, the dependent nodes should already be in the worklist. If this is true, then the second for statement should be removable.
As for the code in normalizeNonPipelinedInstructions, I think the FIXME section maybe wasn't needed to start with because the original SU.Preds didn't contain the loop-carried antidependences, so adding them in wasn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those edges are actually in the new DDG, the dependent nodes should already be in the worklist
This is true.
Then, it looks like that it's safe to remove both of them. I'd like to remove them in the future to simplify the code, if there is no performance degradation. Thank you for your review!
@madhur13490 @androm3da |
Gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
Hi. Thank you to everyone who reviewed it. It seems to me that no one is currently in a position to be a code owner of I believe that this patch won't cause any performance degradation. Also, this is necessary to fix correctness issues related to dependency analysis. |
Thanks for adding this change. I'm in favor of adding the abstraction as it is a big improvement over the existing way of using the DAG. I'm sure there will be changes or other improvements, but it's a step in the right direction. I've only superficially gone through the changes, so I defer to @dpenry, who seems good with the patch. |
@bcahoon Thanks for your response. Yes, this is the first step of improvements. I would like to submit more patches after this is merged. @dpenry Please let me check. Your review status is now "Request changes". What kind of changes are you requesting? Or did you simply misstate the status? Your last comment looks to me like you approve of this patch. |
…e Graph In MachinePipeliner, a DAG class is used to represent the Data Dependence Graph. Data Dependence Graph generally contains cycles, so it's not appropriate to use DAG classes. In fact, some "hacks" are used to express back-edges in the current implementation. This patch adds a new class to provide a better interface for manipulating dependencies. Our approach is as follows: - To build the graph, we use the ScheduleDAGInstrs class as it is, because it has powerful functions and the current implementation depends heavily on it. - After the graph construction is finished (i.e., during scheduling), we use the new class DataDependenceGraph to manipulate the dependencies. Since we don't change the dependencies during scheduling, the new class only provides functions to read them. Also, this patch is only a refactoring, i.e., scheduling results should not change with or without this patch.
b71f265
to
f6f6c72
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/13677 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/18383 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/10572 Here is the relevant piece of the build log for the reference
|
Distance = 1; | ||
std::swap(Src, Dst); | ||
auto Reg = Pred.getReg(); | ||
Pred = SDep(Src, SDep::Kind::Data, Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a bug here, this "Pred" changing caused the querying of isAntiDep()
below to be incorrect.
Can you fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate the problem?
Here the dependency type is changed from Anti to Data only when the source of the edge is PHI. IIUIC, the anti dependency from a PHI node is used to represent a back-edge. In this patch, the code that handles the back-edges has been completely changed, and the anti-dependencies for physical registers are left in place. So I don't think this part doesn't is a problem. As far as I have tested on my local with llvm-test-suite, there are no differences between generated binaries built with and without this patch. Do you have any cases where it causes problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the issue I think.
I had a bug (compiler crash on assertion) on an offline target (not in LLVM community).
I investigated and got to a conclusion that this is the issue.
You changed the phi anti-dep handling.
So when originally having,
some_instruction -> phi; anti dep
You have in your new DAG:
some_instruction -> phi, data dep
In the original consideration, some_instruction is a successor of the phi with anit-dep.
and it was missed out in succ_l.
See my fix, does it make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your investigation. I see, so there are actual cases where the compiler crashes. Hmm, but I'm not sure what the root cause is.
- A dependence type is changed from
anti
todata
only if the source instruction is PHI. So, in your example, the dependence type is not intended to be replaced unless thesome_instr
is PHI. - When the dependence is replaced with
data
, the source and the destination is swapped at the same time. In your case, theDDG
behaves as if there is an edge from thephi
to thesome_instr
. Therefore, this edge should be handled in the previous loopfor (const auto &OE : DDG->getOutEdges(SU)) { ... }
.
So I'm not sure why your fix changes the behavior. Could you give me more details of your case (e.g., what exactly are phi
and some_instr
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's track what happens when sending this edge to your DDGEdge constructor:
Input:
some_instruction -> phi; anti dep
When evaluating the phi's successor to initialize the edge, DDGEdge will be called with the following arguments:
PredOrSucc: phi
IsSucc = true,
Dep: some_instruction; anti dep
Meaning now we have:
Src = phi
Dst = some_instruction
Pred's SUnit will be: phi
Then this code will be executed:
Meaning the edge lost its "anti" kind and now we model it as data.
My problem is that now we don't consider that
some_instruction is a successor of the phi in some parts of the pipeliner's code:
computeNodeOrder
succ_L
I believe that, in addition to checking the anti-deps on the input edges, we need to check edges that have a non-zero distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, pred_L. Probably other cases too. I think the check for isAntiDep() can be replaced by getDistance(). That is, with the check for getDistance(), isAntiDep() isn't needed anymore.
Where does the assertion occur? With some more details it might be possible to create a test case. I assume some invalid node order is created? Would it be possible to show the pipeliner debug output for the use case (with and without the check for getDistance())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anti dependence from some_instr
to phi
becomes the data dependence from phi
to some_instr
, as I intended. And now, the vector returned by something like DDG->getOutEdges(phi)
should contain that edge. That is, in your case, the some_instr
should be appended to Succs
in the for loop at line 1921.
I suspect that the DDG
is not constructed as I intended, if your change (adding the distance check) fixes the problem. I think there is another issue where we should really address.
And, pred_L. Probably other cases too.
Yes, I agree with this.
I think the check for isAntiDep() can be replaced by getDistance(). That is, with the check for getDistance(), isAntiDep() isn't needed anymore.
I don't think so. Before this patch was merged, the pred_L
was defined as follows:
static bool pred_L(SetVector<SUnit *> &NodeOrder,
SmallSetVector<SUnit *, 8> &Preds,
const NodeSet *S = nullptr) {
Preds.clear();
for (const SUnit *SU : NodeOrder) {
for (const SDep &Pred : SU->Preds) {
if (S && S->count(Pred.getSUnit()) == 0)
continue;
if (ignoreDependence(Pred, true))
continue;
if (NodeOrder.count(Pred.getSUnit()) == 0)
Preds.insert(Pred.getSUnit());
}
// Back-edges are predecessors with an anti-dependence.
for (const SDep &Succ : SU->Succs) {
if (Succ.getKind() != SDep::Anti)
continue;
if (S && S->count(Succ.getSUnit()) == 0)
continue;
if (NodeOrder.count(Succ.getSUnit()) == 0)
Preds.insert(Succ.getSUnit());
}
}
return !Preds.empty();
}
The inline comment says that the second loop handles back-edges, but actually the loop body only checks if the edge is anti dependence. That is, in pred_L
, anti forward-edges (I think many of them are WAR dependencies for some physical register) are handled as predecessors. The same goes for succ_L
. I suspect this is a mishandling of back-edges, but to keep the original behavior, the anti dependence check is still needed.
Anyway, I also want to see some more details in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'll send you a debug log but before I need to understand your change a bit more.
You said and I quote:
"The anti dependence from some_instr to phi becomes the data dependence from phi to some_instr, as I intended."
By looking at this code:
Src will be some_instr
Dst will be the phi.
So I believe you meant to tell me that data dep is from some_instr to phi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The anti dependence from some_instr to phi becomes the data dependence from phi to some_instr, as I intended."
By looking at this code:
Src will be some_instr
Dst will be the phi.So I believe you meant to tell me that data dep is from some_instr to phi?
Yes, I meant that, but now I think something is incorrect. In your previous comment, you said:
Input:
some_instruction -> phi; anti dep
Also
PredOrSucc: phi
IsSucc = true,
Dep: some_instruction; anti dep
This seems to be wrong. IIUIC, isSucc
should be false
in this case. This argument doesn't mean the "actual semantics" of the dependency. It just indicates whether the Dep
is contained in PredOrSucc->Preds
or PredOrSucc->Succs
. Inserting some validation code like the following at the top of the ctor might be helpful (I don't check if this code works fine).
const auto &Edges = (IsSucc ? PredOrSucc->Succs : PredOrSucc->Preds);
bool Valid = false;
for (const SDep &D : Edges)
if (D == Dep)
Valid = true;
assert(Valid && "isSucc may be incorrect");
…e Graph
In MachinePipeliner, a DAG class is used to represent the Data Dependence Graph. Data Dependence Graph generally contains cycles, so it's not appropriate to use DAG classes. In fact, some "hacks" are used to express back-edges in the current implementation. This patch adds a new class to provide a better interface for manipulating dependencies. Our approach is as follows:
Since we don't change the dependencies during scheduling, the new class only provides functions to read them. Also, this patch is only a refactoring, i.e., scheduling results should not change with or without this patch.
This patch is a part of the improvement of MachinePipeliner discussed on https://discourse.llvm.org/t/machinepipeliner-replace-swingschedulerdag-with-directed-graph-that-allows-cycles/76465 . There are other patches that depend on this and will be submitted later. Specifically, I am planning to submit the following:
I've written several
FIXME
comments in this patch, and they are resolved by subsequent patches listed above.