Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 - Investigate why a mov is inserted #83618

Open
TIHan opened this issue Mar 17, 2023 · 6 comments
Open

ARM64 - Investigate why a mov is inserted #83618

TIHan opened this issue Mar 17, 2023 · 6 comments
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@TIHan
Copy link
Contributor

TIHan commented Mar 17, 2023

This PR had a few diffs that looked like this:

R2RTest.CompileSerpCommand+<FilterAssembliesNoSimpleNameDuplicates>d__32:MoveNext():bool:this

-            cmp     w0, #0
+            tbz     w0, #31, G_M20047_IG08
+						;; size=92 bbWeight=2 PerfScore 60.00
+G_M20047_IG06:        ; bbWeight=0.50, gcrefRegs=380000 {x19 x20 x21}, byrefRegs=0000 {}, byref
             mov     x0, x19
             ; gcrRegs +[x0]
-            bge     G_M20047_IG08
-						;; size=100 bbWeight=2 PerfScore 62.00
-G_M20047_IG06:        ; bbWeight=0.50, gcrefRegs=300001 {x0 x20 x21}, byrefRegs=0000 {}, byref
-            ; gcrRegs -[x19]

It is curious that mov x0, x19 was inserted between the cmp and bge before. But now that we transform those two instructions into a single tbz, the mov is inserted after a branching operation instead of before - which may cause a perf regression.

Might be worthwhile to investigate why the mov was inserted in the first place. There were not many of these cases as a result of the PR, so it's probably a non-issue in general.

@TIHan TIHan added arch-arm64 tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Mar 17, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 17, 2023
@ghost
Copy link

ghost commented Mar 17, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR had a few diffs that looked like this:

R2RTest.CompileSerpCommand+<FilterAssembliesNoSimpleNameDuplicates>d__32:MoveNext():bool:this

-            cmp     w0, #0
+            tbz     w0, #31, G_M20047_IG08
+						;; size=92 bbWeight=2 PerfScore 60.00
+G_M20047_IG06:        ; bbWeight=0.50, gcrefRegs=380000 {x19 x20 x21}, byrefRegs=0000 {}, byref
             mov     x0, x19
             ; gcrRegs +[x0]
-            bge     G_M20047_IG08
-						;; size=100 bbWeight=2 PerfScore 62.00
-G_M20047_IG06:        ; bbWeight=0.50, gcrefRegs=300001 {x0 x20 x21}, byrefRegs=0000 {}, byref
-            ; gcrRegs -[x19]

It is curious that mov x0, x19 was inserted between the cmp and bge before. But now that we transform those two instructions into a single tbz, the mov is inserted after a branching operation instead of before - which may cause a perf regression.

Might be worthwhile to investigate why the mov was inserted in the first place. There are not many of these cases so it's probably a non-issue in general.

Author: TIHan
Assignees: -
Labels:

arch-arm64, tenet-performance, area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member

Did you check the jitdump?

@jakobbotsch
Copy link
Member

It looks like a resolution move, that should be the only way we can get a mov in between a flags def and use. It was moved to the beginning of the next block because it interferes with w0 that is used by the JCMP node.

@TIHan
Copy link
Contributor Author

TIHan commented Mar 18, 2023

Did you check the jitdump?

I didn't get a jitdump of this - I guess that's part of the investigation :)

It looks like a resolution move, that should be the only way we can get a mov in between a flags def and use. It was moved to the beginning of the next block because it interferes with w0 that is used by the JCMP node.

It does look that way.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 20, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Mar 20, 2023
@kunalspathak
Copy link
Member

It does look that way.

Should this be closed or are you planning to investigate this more?

@TIHan
Copy link
Contributor Author

TIHan commented Mar 28, 2023

Should this be closed or are you planning to investigate this more

I could investigate more but it's not a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants