Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MachinePipeliner] Add an abstract layer to manipulate Data Dependenc… #109918

Merged
merged 5 commits into from
Dec 24, 2024

Conversation

kasuga-fj
Copy link
Contributor

…e Graph

In MachinePipeliner, a DAG class is used to represent the Data Dependence Graph. Data Dependence Graph generally contains cycles, so it's not appropriate to use DAG classes. In fact, some "hacks" are used to express back-edges in the current implementation. This patch adds a new class to provide a better interface for manipulating dependencies. Our approach is as follows:

  • To build the graph, we use the ScheduleDAGInstrs class as it is, because it has powerful functions and the current implementation depends heavily on it.
  • After the graph construction is finished (i.e., during scheduling), we use the new class DataDependenceGraph to manipulate the dependencies.

Since we don't change the dependencies during scheduling, the new class only provides functions to read them. Also, this patch is only a refactoring, i.e., scheduling results should not change with or without this patch.


This patch is a part of the improvement of MachinePipeliner discussed on https://discourse.llvm.org/t/machinepipeliner-replace-swingschedulerdag-with-directed-graph-that-allows-cycles/76465 . There are other patches that depend on this and will be submitted later. Specifically, I am planning to submit the following:

  1. This one.
  2. (Optional) Remove unnatural codes surfaced by this patch.
  3. Fix the currently missed output/sequence dependencies.

I've written several FIXME comments in this patch, and they are resolved by subsequent patches listed above.

@kasuga-fj kasuga-fj force-pushed the machinepipeliner-ddg-abstract-layer branch from e02f4ab to 2424835 Compare September 26, 2024 01:28
@kasuga-fj
Copy link
Contributor Author

ping

Copy link
Contributor

@dpenry dpenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this is a mostly mechanical fix which looks fine, except that it ignores the fact that antidependences in memory and explicit physical registers also create loop-carried dependences without having any phi nodes. The existing algorithm sometimes takes these into account, and sometimes not, which makes it truly confusing. As only the phi-node anti-dependences are turned into data nodes with distance 1 in this path, at least some of the places where the code is marked "To preserve previous behavior and prevent regression" are likely necessary and can't be removed "if this doesn't have significant impact on performance".

I'm particularly concerned about the fact that these antidependences are not reversed before calling createAdjacencyStructure, so the circuits they imply won't be seen at all, and that's going to do ... something, so I think we need to leave in the reversal for now. It might be possible to merge this behavior into createAdjacencyStructure.

I also note that I would expect some performance changes as a result of this change simply because nodes do get added to queues in a different order; previously most queues would process things in "within iteration, then loop-dependent" order.

@kasuga-fj
Copy link
Contributor Author

except that it ignores the fact that antidependences in memory and explicit physical registers also create loop-carried dependences without having any phi nodes

The existing implementation doesn't express loop-carried anti-dependencies in memory and physical registers directly by anti-edges. For memory, such dependencies are detected by calling isLoopCarriedDep. I left them as they are, so there should be no problem. Loop-carried anti-dependencies in physical registers don't seem to be taken into account at all. I think this should be fixed, but the problem is independent of this patch. Regarding the code marked "To preserve previous behavior and prevent regression", I think they don't take into account loop-carried dependencies in memory and physical registers. This is based on the fact that if they do consider them, data-dependencies and output-dependencies also should be checked, but they do not.

I'm particularly concerned about the fact that these antidependences are not reversed before calling createAdjacencyStructure

Now SwingSchedulerDDG is used instead of SwingSchedulerDAG in createAdjacencyStructure, so the detected cycles should remain the same.

I also note that I would expect some performance changes as a result of this change simply because nodes do get added to queues in a different order

Yes, the iteration order of edges does change. However, I believe that for most functions, their behavior is independent of the order of the edges. Circuits::circuit should be affected, but the order is the same as queued, not "within iteration then loop-dependent".

I forgot to mention in the description, but I used llvm-test-suite to verify that there is no change in the compiled result before and after this change. Of course, I don't try every combination of compilation options, but I believe that this patch doesn't change the existing behavior. However, as you said, the existing algorithm is confusing, so there is a great possibility of missing or misunderstanding.

@kasuga-fj kasuga-fj requested a review from dpenry October 11, 2024 12:14
@kasuga-fj
Copy link
Contributor Author

kasuga-fj commented Oct 18, 2024

Hi, @dpenry

Thank you for your previous comment. On second thought, I am beginning to think that the code marked "To preserve previous behavior and prevent regression" should be left in place.

My guess remains the same as before. That is, I think the handling of these anti-dependencies looks unintentional to me. If loop-carried physical register dependencies are to be handled correctly, at least the following must also be considered (but are not now).

  • Not only anti-dependencies, but also data-dependencies and output-dependencies need to be taken into account
  • If there are multiple WAR dependencies of the same physical register, the current checks are insufficient
    • Example: https://godbolt.org/z/6vqKMb5TK
      • This is neither a loop nor SSA form, but this type of physical register dependencies can occur in MachinePipeliner
    • There is a loop-carried dependence between SU(3) and SU(0), but it is not handled

However, it is possible that the current implementation just happens to handle loop-carried dependencies correctly and works fine. So, now I agree with you that we should not remove code marked as "To preserve previous behavior and prevent regression". But these are just workarounds and should be fixed to treat other loop-carried dependencies correctly in the future.

What do you think?

@dpenry
Copy link
Contributor

dpenry commented Oct 21, 2024

My guess remains the same as before. That is, I think the handling of these anti-dependencies looks unintentional to me. If loop-carried physical register dependencies are to be handled correctly, at least the following must also be considered (but are not now).

* Not only anti-dependencies, but also data-dependencies and output-dependencies need to be taken into account

* If there are multiple WAR dependencies of the same physical register, the current checks are insufficient
  
  * Example: https://godbolt.org/z/6vqKMb5TK
    
    * This is neither a loop nor SSA form, but this type of physical register dependencies can occur in MachinePipeliner
  * There is a loop-carried dependence between SU(3) and SU(0), but it is not handled

Physical register dependency support is not unintentional, just not widely used and somewhat under-tested because software pipelining is enabled for only a handful of architectures and the first ones for which it was used supported decrement-and-branch instructions. When I added support for SMS for ARM and had to support using normal conditional branches using the condition codes, I had to insert some special cases for physical registers because some dependences were not being respected. See the comments in dcb7764.

I'm not sure what the godbolt link you've supplied is supposed to show, as there isn't a loop, so you can't say anything about whether loop-carried dependences are handled correctly. The proper output and anti-dependencies for the MIR are present in the data graph. One might argue that an anti-dependence between SU(1) and SU(4) is missing, but the anti-dependence from 1->2 combined with the output dependence from 2->4 make it unnecessary, and I think the DDG builder might notice that.

@kasuga-fj
Copy link
Contributor Author

I'm not sure what the godbolt link you've supplied is supposed to show, as there isn't a loop, so you can't say anything about whether loop-carried dependences are handled correctly. The proper output and anti-dependencies for the MIR are present in the data graph. One might argue that an anti-dependence between SU(1) and SU(4) is missing, but the anti-dependence from 1->2 combined with the output dependence from 2->4 make it unnecessary, and I think the DDG builder might notice that.

Sorry, something was wrong with the my previous godbolt link. The following is the correct one.

https://godbolt.org/z/3v5rnPG3o

I think that the loop-carried dependence SU(5) -> SU(2) is not handled correctly in the current implementation. If we're just talking about this case, it doesn't matter because instructions that read/write $nzcv (which I think is equivalent to $cpsr in ARM) are ultimately scheduled to stage 0. However, the problem can occur if similar dependencies are created by any other different physical registers. As fixed in dcb7764, dependencies created by the status register are handled correctly, but those created by other physical registers seem not to be. I'm not sure if such dependencies exist, but I think FFR register of Arm SVE could create such one. This is what I meant in the previous comment "should be fixed to treat other loop-carried dependencies correctly in the future".

@dpenry
Copy link
Contributor

dpenry commented Oct 22, 2024

https://godbolt.org/z/3v5rnPG3o

I think that the loop-carried dependence SU(5) -> SU(2) is not handled correctly in the current implementation. If we're just talking about this case, it doesn't matter because instructions that read/write $nzcv (which I think is equivalent to $cpsr in ARM) are ultimately scheduled to stage 0. However, the problem can occur if similar dependencies are created by any other different physical registers. As fixed in dcb7764, dependencies created by the status register are handled correctly, but those created by other physical registers seem not to be. I'm not sure if such dependencies exist, but I think FFR register of Arm SVE could create such one. This is what I meant in the previous comment "should be fixed to treat other loop-carried dependencies correctly in the future".

That godbolt makes more sense! I do note that in this particular instance, an SU(5)->SU(2) dependence carried across the loop is not seen because SU(7) kills the liveness of SU(5)'s CPSR write in the output direction and SU(1) kills the liveness of whatever entered the basic block in the input direction. Is that a problem in general? I'd have to think more on it.

Note that CPSR isn't called out specially in the code; all physical registers are treated alike. Are all instructions annotated properly with implicit physical register def/use/kill information? That's up to the instruction set definitions to get right.

SVE's FFR is an interesting beast because it's a cumulative fault register, so there isn't really a sensible notion of an output dependence, kind of like FPSR flags. Getting the order of writes to it wrong generally doesn't matter until it's read. However, RDFFR/RDFFRS/SETFFR/WRFFR probably should be marked as unpipelineable instructions so that we don't try to pipeline some odd code that actually does care about the order.

I do think we've gotten a bit far afield here. We don't (yet) have an example of a failure case due to loop-carried physical registers or memory dependencies with the old code, and we're leaving in most of the bits of logic that purport to handle it. The only question is whether the removal of the reversal code around createAdjacencyStructure is going to create a new failure case in the presence of loop-carried physical register or memory dependencies. That removal results in fewer computed circuits. Circuits are used for MII calculation and for checkValidNodeOrder. MII calculation is not a correctness issue. As for validity, being in a circuit makes it legal to be after both successors and predecessors, so fewer circuits means fewer legal orders, so I don't think we can introduce new failure cases, just reduced opportunities for pipelining. So, I think we're good for now. I would like to see what others have to say.

Copy link
Contributor

@dpenry dpenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, after all that discussion, I'm good with this patch but I'd like to see what other people have to say about the new abstraction itself.

@kasuga-fj
Copy link
Contributor Author

Thanks for your detailed comments!

Is that a problem in general? I'd have to think more on it.

Unless we can determine that it's not going to be a problem, I think we should be conservative with dependency analysis. However, as you said, it's also true that I don't have an example that triggers invalid code generation. It also complicates things that missing dependencies do not necessarily cause invalid code to be generated.

In any case, the FIXME comments I added in this patch need to be changed.

The only question is whether the removal of the reversal code around createAdjacencyStructure is going to create a new failure case in the presence of loop-carried physical register or memory dependencies. That removal results in fewer computed circuits. Circuits are used for MII calculation and for checkValidNodeOrder. MII calculation is not a correctness issue. As for validity, being in a circuit makes it legal to be after both successors and predecessors, so fewer circuits means fewer legal orders, so I don't think we can introduce new failure cases, just reduced opportunities for pipelining.

I believe that the circuits to be detected do not change with or without this patch, because

  • If I understand correctly, all memory dependencies are represented as order-dependence, so they are not affected by swapAntiDependences.
  • In the original implementation, anti-dependencies originated from physical registers are completely ignored.

Also, checkValidNodeOrder does not prevent loops from being scheduled even if they have an invalid node order. It just checks the order, but doesn't inhibit subsequent processes.

Comment on lines +3184 to +3141
// To preserve previous behavior and prevent regression
// FIXME: Remove if this doesn't have significant impact on
for (const auto &OE : DDG->getOutEdges(SU))
if (OE.getDistance() == 1)
Worklist.push_back(OE.getDst());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpenry
Let me make sure. In terms of correctness, we can remove the for statement here, right? I added this for statement to preserve the behavior of the original implementation. This appends all predecessors of the SU, which may contain anti-dependencies from a PHI. There is a similar part in normalizeNonPipelinedInstructions, which I believe we can remove as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the original code is adding the loop-carried antidependences for just the PHI nodes and I think those edges are actually in the new DDG, the dependent nodes should already be in the worklist. If this is true, then the second for statement should be removable.

As for the code in normalizeNonPipelinedInstructions, I think the FIXME section maybe wasn't needed to start with because the original SU.Preds didn't contain the loop-carried antidependences, so adding them in wasn't needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those edges are actually in the new DDG, the dependent nodes should already be in the worklist

This is true.

Then, it looks like that it's safe to remove both of them. I'd like to remove them in the future to simplify the code, if there is no performance degradation. Thank you for your review!

@kasuga-fj
Copy link
Contributor Author

@madhur13490 @androm3da
Hi, could you please review this?

@kasuga-fj
Copy link
Contributor Author

Gentle ping

Copy link
Contributor

@aankit-ca aankit-ca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

Copy link
Contributor Author

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!

@kasuga-fj
Copy link
Contributor Author

Hi. Thank you to everyone who reviewed it.

It seems to me that no one is currently in a position to be a code owner of MachinePipeliner, so that no one can make the decision to approve or reject this patch. What should I do for such case? Should I wait for more time?

I believe that this patch won't cause any performance degradation. Also, this is necessary to fix correctness issues related to dependency analysis.

@bcahoon
Copy link
Contributor

bcahoon commented Dec 18, 2024

Thanks for adding this change. I'm in favor of adding the abstraction as it is a big improvement over the existing way of using the DAG. I'm sure there will be changes or other improvements, but it's a step in the right direction. I've only superficially gone through the changes, so I defer to @dpenry, who seems good with the patch.

@kasuga-fj
Copy link
Contributor Author

@bcahoon Thanks for your response. Yes, this is the first step of improvements. I would like to submit more patches after this is merged.

@dpenry Please let me check. Your review status is now "Request changes". What kind of changes are you requesting? Or did you simply misstate the status? Your last comment looks to me like you approve of this patch.

…e Graph

In MachinePipeliner, a DAG class is used to represent the Data
Dependence Graph. Data Dependence Graph generally contains cycles, so
it's not appropriate to use DAG classes. In fact, some "hacks" are used
to express back-edges in the current implementation. This patch adds a
new class to provide a better interface for manipulating dependencies.
Our approach is as follows:

- To build the graph, we use the ScheduleDAGInstrs class as it is,
  because it has powerful functions and the current implementation
  depends heavily on it.
- After the graph construction is finished (i.e., during scheduling), we
  use the new class DataDependenceGraph to manipulate the dependencies.

Since we don't change the dependencies during scheduling, the new class
only provides functions to read them. Also, this patch is only a
refactoring, i.e., scheduling results should not change with or without
this patch.
@kasuga-fj kasuga-fj force-pushed the machinepipeliner-ddg-abstract-layer branch from b71f265 to f6f6c72 Compare December 23, 2024 23:18
@kasuga-fj kasuga-fj merged commit 0d6a584 into llvm:main Dec 24, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 24, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-sles-build-only running on rocm-worker-hw-04-sles while building llvm at step 8 "Add check check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/13677

Here is the relevant piece of the build log for the reference
Step 8 (Add check check-llvm) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
RUN: at line 3: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llvm-jitlink -debug-only=orc -noexec -abs _external_func=0x1    -entry=_foo /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp 2>&1 | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llvm-jitlink -debug-only=orc -noexec -abs _external_func=0x1 -entry=_foo /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s:13:10: error: CHECK: expected string not found in input
# CHECK: In main emitting {{.*}}_foo{{.*}}
         ^
<stdin>:1:1: note: scanning from here
Defining MU <Absolute Symbols> for main (tracker: 0x0, default will be used)
^
<stdin>:38:1: note: possible intended match here
In main defining materializing symbols { }
^

Input file: <stdin>
Check file: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: Defining MU <Absolute Symbols> for main (tracker: 0x0, default will be used) 
check:13'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2:  { ("_external_func", [Data]) } 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            3: Defining MU /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp for main (tracker: default) { ("_foo", [Callable]), ("_baz", [Callable]) } 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4: Looking up { (_foo, RequiredSymbol) } in [ ("main", MatchAllSymbols), ("Process", MatchExportedSymbolsOnly) ] (required state: Ready) 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            5: Dispatching MaterializationUnits... 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            6: Done dispatching MaterializationUnits. 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
            .
           33: No dependencies to register 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           34: Dispatching MaterializationUnits... 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           35:  Dispatching "/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp" 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           36: Dispatching: Materialization task: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp in main 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 24, 2024

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/18383

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /build/buildbot/premerge-monolithic-linux/build/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /build/buildbot/premerge-monolithic-linux/build/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
RUN: at line 3: /build/buildbot/premerge-monolithic-linux/build/bin/llvm-jitlink -debug-only=orc -noexec -abs _external_func=0x1    -entry=_foo /build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp 2>&1 | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /build/buildbot/premerge-monolithic-linux/build/bin/llvm-jitlink -debug-only=orc -noexec -abs _external_func=0x1 -entry=_foo /build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s:13:10: error: CHECK: expected string not found in input
# CHECK: In main emitting {{.*}}_foo{{.*}}
         ^
<stdin>:1:1: note: scanning from here
Defining MU <Absolute Symbols> for main (tracker: 0x0, default will be used)
^
<stdin>:38:1: note: possible intended match here
In main defining materializing symbols { }
^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: Defining MU <Absolute Symbols> for main (tracker: 0x0, default will be used) 
check:13'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2:  { ("_external_func", [Data]) } 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            3: Defining MU /build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp for main (tracker: default) { ("_foo", [Callable]), ("_baz", [Callable]) } 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4: Looking up { (_foo, RequiredSymbol) } in [ ("main", MatchAllSymbols), ("Process", MatchExportedSymbolsOnly) ] (required state: Ready) 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            5: Dispatching MaterializationUnits... 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            6: Done dispatching MaterializationUnits. 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
            .
           33: No dependencies to register 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           34: Dispatching MaterializationUnits... 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           35:  Dispatching "/build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp" 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           36: Dispatching: Materialization task: /build/buildbot/premerge-monolithic-linux/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp in main 
check:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 24, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/10572

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llvm-mc -triple=x86_64-apple-macosx10.9 -filetype=obj -o /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
RUN: at line 3: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llvm-jitlink -debug-only=orc -noexec -abs _external_func=0x1    -entry=_foo /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp 2>&1 | /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llvm-jitlink -debug-only=orc -noexec -abs _external_func=0x1 -entry=_foo /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s
�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s:13:10: �[0m�[0;1;31merror: �[0m�[1mCHECK: expected string not found in input
�[0m# CHECK: In main emitting {{.*}}_foo{{.*}}
�[0;1;32m         ^
�[0m�[1m<stdin>:1:1: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mDefining MU <Absolute Symbols> for main (tracker: 0x0, default will be used)
�[0;1;32m^
�[0m�[1m<stdin>:38:1: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0mIn main defining materializing symbols { }
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/ExecutionEngine/JITLink/x86-64/LocalDependencyPropagation.s

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46mDefining MU <Absolute Symbols> for main (tracker: 0x0, default will be used) �[0m
�[0;1;31mcheck:13'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
�[0m�[0;1;30m            2: �[0m�[1m�[0;1;46m { ("_external_func", [Data]) } �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            3: �[0m�[1m�[0;1;46mDefining MU /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/test/ExecutionEngine/JITLink/x86-64/Output/LocalDependencyPropagation.s.tmp for main (tracker: default) { ("_baz", [Callable]), ("_foo", [Callable]) } �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            4: �[0m�[1m�[0;1;46mLooking up { (_foo, RequiredSymbol) } in [ ("main", MatchAllSymbols), ("Process", MatchExportedSymbolsOnly) ] (required state: Ready) �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            5: �[0m�[1m�[0;1;46mDispatching MaterializationUnits... �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            6: �[0m�[1m�[0;1;46mDone dispatching MaterializationUnits. �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            7: �[0m�[1m�[0;1;46mEntering OL_applyQueryPhase1: �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46m Lookup kind: Static �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m Search order: [ ("main", MatchAllSymbols), ("Process", MatchExportedSymbolsOnly) ], Current index = 0 (entering new JITDylib) �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           10: �[0m�[1m�[0;1;46m Lookup set: { (_foo, RequiredSymbol) } �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           11: �[0m�[1m�[0;1;46m Definition generator candidates: { (_foo, RequiredSymbol) } �[0m
�[0;1;31mcheck:13'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           12: �[0m�[1m�[0;1;46m Definition generator non-candidates: { } �[0m
...

Distance = 1;
std::swap(Src, Dst);
auto Reg = Pred.getReg();
Pred = SDep(Src, SDep::Kind::Data, Reg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a bug here, this "Pred" changing caused the querying of isAntiDep() below to be incorrect.
Can you fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate the problem?
Here the dependency type is changed from Anti to Data only when the source of the edge is PHI. IIUIC, the anti dependency from a PHI node is used to represent a back-edge. In this patch, the code that handles the back-edges has been completely changed, and the anti-dependencies for physical registers are left in place. So I don't think this part doesn't is a problem. As far as I have tested on my local with llvm-test-suite, there are no differences between generated binaries built with and without this patch. Do you have any cases where it causes problems?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the issue I think.
I had a bug (compiler crash on assertion) on an offline target (not in LLVM community).
I investigated and got to a conclusion that this is the issue.

You changed the phi anti-dep handling.
So when originally having,

some_instruction -> phi; anti dep

You have in your new DAG:
some_instruction -> phi, data dep

In the original consideration, some_instruction is a successor of the phi with anit-dep.
and it was missed out in succ_l.
See my fix, does it make sense to you?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your investigation. I see, so there are actual cases where the compiler crashes. Hmm, but I'm not sure what the root cause is.

  • A dependence type is changed from anti to data only if the source instruction is PHI. So, in your example, the dependence type is not intended to be replaced unless the some_instr is PHI.
  • When the dependence is replaced with data, the source and the destination is swapped at the same time. In your case, the DDG behaves as if there is an edge from the phi to the some_instr. Therefore, this edge should be handled in the previous loop for (const auto &OE : DDG->getOutEdges(SU)) { ... }.

So I'm not sure why your fix changes the behavior. Could you give me more details of your case (e.g., what exactly are phi and some_instr)?

Copy link
Contributor

@mmarjieh mmarjieh Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's track what happens when sending this edge to your DDGEdge constructor:
Input:
some_instruction -> phi; anti dep

When evaluating the phi's successor to initialize the edge, DDGEdge will be called with the following arguments:
image

PredOrSucc: phi
IsSucc = true,
Dep: some_instruction; anti dep

First you swap Src and Dst:
image

Meaning now we have:
Src = phi
Dst = some_instruction
Pred's SUnit will be: phi

Then this code will be executed:
image
Meaning the edge lost its "anti" kind and now we model it as data.

My problem is that now we don't consider that
some_instruction is a successor of the phi in some parts of the pipeliner's code:
computeNodeOrder
succ_L

I believe that, in addition to checking the anti-deps on the input edges, we need to check edges that have a non-zero distance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, pred_L. Probably other cases too. I think the check for isAntiDep() can be replaced by getDistance(). That is, with the check for getDistance(), isAntiDep() isn't needed anymore.
Where does the assertion occur? With some more details it might be possible to create a test case. I assume some invalid node order is created? Would it be possible to show the pipeliner debug output for the use case (with and without the check for getDistance())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anti dependence from some_instr to phi becomes the data dependence from phi to some_instr, as I intended. And now, the vector returned by something like DDG->getOutEdges(phi) should contain that edge. That is, in your case, the some_instr should be appended to Succs in the for loop at line 1921.

https://github.com/kasuga-fj/llvm-project/blob/f6f6c72322bfef93e21e3b043b4c4fc4f8a13fba/llvm/lib/CodeGen/MachinePipeliner.cpp#L1921

I suspect that the DDG is not constructed as I intended, if your change (adding the distance check) fixes the problem. I think there is another issue where we should really address.

And, pred_L. Probably other cases too.

Yes, I agree with this.

I think the check for isAntiDep() can be replaced by getDistance(). That is, with the check for getDistance(), isAntiDep() isn't needed anymore.

I don't think so. Before this patch was merged, the pred_L was defined as follows:

static bool pred_L(SetVector<SUnit *> &NodeOrder,
                   SmallSetVector<SUnit *, 8> &Preds,
                   const NodeSet *S = nullptr) {
  Preds.clear();
  for (const SUnit *SU : NodeOrder) {
    for (const SDep &Pred : SU->Preds) {
      if (S && S->count(Pred.getSUnit()) == 0)
        continue;
      if (ignoreDependence(Pred, true))
        continue;
      if (NodeOrder.count(Pred.getSUnit()) == 0)
        Preds.insert(Pred.getSUnit());
    }
    // Back-edges are predecessors with an anti-dependence.
    for (const SDep &Succ : SU->Succs) {
      if (Succ.getKind() != SDep::Anti)
        continue;
      if (S && S->count(Succ.getSUnit()) == 0)
        continue;
      if (NodeOrder.count(Succ.getSUnit()) == 0)
        Preds.insert(Succ.getSUnit());
    }
  }
  return !Preds.empty();
}

The inline comment says that the second loop handles back-edges, but actually the loop body only checks if the edge is anti dependence. That is, in pred_L, anti forward-edges (I think many of them are WAR dependencies for some physical register) are handled as predecessors. The same goes for succ_L. I suspect this is a mishandling of back-edges, but to keep the original behavior, the anti dependence check is still needed.

Anyway, I also want to see some more details in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'll send you a debug log but before I need to understand your change a bit more.
You said and I quote:
"The anti dependence from some_instr to phi becomes the data dependence from phi to some_instr, as I intended."
image

By looking at this code:
Src will be some_instr
Dst will be the phi.

So I believe you meant to tell me that data dep is from some_instr to phi?

Copy link
Contributor Author

@kasuga-fj kasuga-fj Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The anti dependence from some_instr to phi becomes the data dependence from phi to some_instr, as I intended."

By looking at this code:
Src will be some_instr
Dst will be the phi.

So I believe you meant to tell me that data dep is from some_instr to phi?

Yes, I meant that, but now I think something is incorrect. In your previous comment, you said:

Input:
some_instruction -> phi; anti dep

Also

PredOrSucc: phi
IsSucc = true,
Dep: some_instruction; anti dep

This seems to be wrong. IIUIC, isSucc should be false in this case. This argument doesn't mean the "actual semantics" of the dependency. It just indicates whether the Dep is contained in PredOrSucc->Preds or PredOrSucc->Succs. Inserting some validation code like the following at the top of the ctor might be helpful (I don't check if this code works fine).

const auto &Edges = (IsSucc ? PredOrSucc->Succs : PredOrSucc->Preds);
bool Valid = false;
for (const SDep &D : Edges)
  if (D == Dep)
    Valid = true;
assert(Valid && "isSucc may be incorrect");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants