Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix arm64 apple hfa<float>. #46034

Merged
merged 5 commits into from
Dec 18, 2020
Merged

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Dec 14, 2020

Apple Arm64 does not require struct alignment for HFA, instead, it uses 4-byte stack alignment for this type. Add Jit support for this. Closes #45780.

The changes:

0296b35: A small fix in dump printing.

ac6fada: Keep the aligned size in fgArgInfo.

There were no users that required the exact size, but keeping it required roundings all over the place.
So now both fgArgInfo and GenTreePutArg keep the align size (with padding after the arg value), but the alignment is different on different platforms.

ee4076f: Fix float HFA stack passing.

Apple doesn't require struct alignment for it.
The main change is introducing eeGetArgAlignment that returns alignment for an argument type. I am planning to extend this method soon (see #46026).
The second change is fix for LclVarDsc::lvSize() so it returns 12 for such variables.

Also, I am trying to use less #if defined(OSX_ARM64_ABI) to make #45501 easier.

2d87c52: Fix lvaMapSimd12ToSimd16 for arm64 apple arguments.

Now when lvSize returns 12 for SIMD12 arguments on arm64 apple we need to fix some asserts that do not expect it.

Fixes 46 tests (including pri1 tests):
JIT/Methodical/explicit/rotate/_dbgrotarg_double/_dbgrotarg_double.sh
JIT/Methodical/explicit/rotate/_opt_dbgrotarg_float/_opt_dbgrotarg_float.sh
JIT/Methodical/explicit/rotate/_opt_dbgrotarg_double/_opt_dbgrotarg_double.sh
JIT/Methodical/explicit/rotate/_il_relrotarg_objref/_il_relrotarg_objref.sh
JIT/jit64/hfa/main/testG/hfa_nd2G_r/hfa_nd2G_r.sh
JIT/jit64/hfa/main/testG/hfa_sd2G_d/hfa_sd2G_d.sh
JIT/jit64/hfa/main/testG/hfa_sd2G_r/hfa_sd2G_r.sh
JIT/jit64/hfa/main/testG/hfa_nd2G_d/hfa_nd2G_d.sh
JIT/jit64/hfa/main/testG/hfa_nf2G_d/hfa_nf2G_d.sh
JIT/jit64/hfa/main/testA/hfa_sf1A_d/hfa_sf1A_d.sh
JIT/jit64/hfa/main/testA/hfa_nf1A_r/hfa_nf1A_r.sh
JIT/jit64/hfa/main/testA/hfa_sf0A_d/hfa_sf0A_d.sh
JIT/jit64/hfa/main/testA/hfa_nf0A_r/hfa_nf0A_r.sh
JIT/jit64/hfa/main/testA/hfa_nf2A_d/hfa_nf2A_d.sh
JIT/jit64/hfa/main/testA/hfa_sf2A_r/hfa_sf2A_r.sh
JIT/jit64/hfa/main/testA/hfa_sd2A_r/hfa_sd2A_r.sh
JIT/jit64/hfa/main/testA/hfa_nd2A_d/hfa_nd2A_d.sh
JIT/jit64/hfa/main/testA/hfa_nf1A_d/hfa_nf1A_d.sh
JIT/jit64/hfa/main/testA/hfa_sf1A_r/hfa_sf1A_r.sh
JIT/jit64/hfa/main/testA/hfa_nf0A_d/hfa_nf0A_d.sh
JIT/jit64/hfa/main/testA/hfa_sf0A_r/hfa_sf0A_r.sh
JIT/jit64/hfa/main/testA/hfa_sf2A_d/hfa_sf2A_d.sh
JIT/jit64/hfa/main/testA/hfa_nf2A_r/hfa_nf2A_r.sh
JIT/jit64/hfa/main/testA/hfa_nd2A_r/hfa_nd2A_r.sh
JIT/jit64/hfa/main/testA/hfa_sd2A_d/hfa_sd2A_d.sh
JIT/jit64/hfa/main/testC/hfa_sf1C_r/hfa_sf1C_r.sh
JIT/jit64/hfa/main/testC/hfa_nf1C_d/hfa_nf1C_d.sh
JIT/jit64/hfa/main/testC/hfa_sf0C_r/hfa_sf0C_r.sh
JIT/jit64/hfa/main/testC/hfa_nf0C_d/hfa_nf0C_d.sh
JIT/jit64/hfa/main/testC/hfa_nf1C_r/hfa_nf1C_r.sh
JIT/jit64/hfa/main/testC/hfa_sf1C_d/hfa_sf1C_d.sh
JIT/jit64/hfa/main/testC/hfa_nf0C_r/hfa_nf0C_r.sh
JIT/jit64/hfa/main/testC/hfa_sf0C_d/hfa_sf0C_d.sh
JIT/jit64/hfa/main/testC/hfa_nf2C_d/hfa_nf2C_d.sh
JIT/jit64/hfa/main/testB/hfa_nf2B_d/hfa_nf2B_d.sh
JIT/jit64/hfa/main/testB/hfa_sf2B_r/hfa_sf2B_r.sh
JIT/jit64/hfa/main/testB/hfa_sd2B_r/hfa_sd2B_r.sh
JIT/jit64/hfa/main/testB/hfa_nd2B_d/hfa_nd2B_d.sh
JIT/jit64/hfa/main/testB/hfa_sf0B_d/hfa_sf0B_d.sh
JIT/jit64/hfa/main/testB/hfa_nf0B_r/hfa_nf0B_r.sh
JIT/jit64/hfa/main/testB/hfa_sf2B_d/hfa_sf2B_d.sh
JIT/jit64/hfa/main/testB/hfa_nf2B_r/hfa_nf2B_r.sh
JIT/jit64/hfa/main/testB/hfa_nd2B_r/hfa_nd2B_r.sh
JIT/jit64/hfa/main/testB/hfa_sd2B_d/hfa_sd2B_d.sh
JIT/jit64/hfa/main/testB/hfa_nf0B_d/hfa_nf0B_d.sh
JIT/jit64/hfa/main/testB/hfa_sf0B_r/hfa_sf0B_r.sh

Note: it does not fully fix

JIT/HardwareIntrinsics/General/Vector128/Vector128_r/Vector128_r.sh
JIT/HardwareIntrinsics/General/Vector128/Vector128_ro/Vector128_ro.sh
JIT/HardwareIntrinsics/General/Vector256/Vector256_r/Vector256_r.sh
JIT/HardwareIntrinsics/General/Vector256/Vector256_ro/Vector256_ro.sh

because they are using reflection and additional VM fixes are needed.

@sandreenko sandreenko added arch-arm64 os-mac-os-x macOS aka OSX area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Dec 14, 2020
@sandreenko
Copy link
Contributor Author

/azp list

@azure-pipelines

This comment has been minimized.

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Sergey Andreenko added 4 commits December 14, 2020 17:43
There was no users that required the exact size, but keeping it required rounding all over the place.
Apple doesn't require struct alignment for it.
@sandreenko sandreenko marked this pull request as ready for review December 15, 2020 22:12
@sandreenko

This comment has been minimized.

@sandreenko sandreenko changed the title Fix arm64 apple hfa. Fix arm64 apple hfa<float>. Dec 15, 2020
@sandreenko
Copy link
Contributor Author

PTAL @dotnet/jit-contrib , cc @sdmaclea @mangod9

@sandreenko
Copy link
Contributor Author

ping @dotnet/jit-contrib

Copy link
Contributor

@sdmaclea sdmaclea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looked through each commit. Each looked reasonable and logical.

Comment on lines +3656 to +3666
// TODO-Review: Sometimes we get called on ARM with HFA struct variables that have been promoted,
// where the struct itself is no longer used because all access is via its member fields.
// When that happens, the struct is marked as unused and its type has been changed to
// TYP_INT (to keep the GC tracking code from looking at it).
// See Compiler::raAssignVars() for details. For example:
// N002 ( 4, 3) [00EA067C] ------------- return struct $346
// N001 ( 3, 2) [00EA0628] ------------- lclVar struct(U) V03 loc2
// float V03.f1 (offs=0x00) -> V12 tmp7
// f8 (last use) (last use) $345
// Here, the "struct(U)" shows that the "V03 loc2" variable is unused. Not shown is that V03
// is now TYP_INT in the local variable table. It's not really unused, because it's in the tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realize this is a pre-existing comment, but raAssignVars is long gone....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I will fix the comment in the next pr.

@sandreenko sandreenko merged commit 3a276d5 into dotnet:master Dec 18, 2020
@sandreenko sandreenko deleted the fixArm64AppleHfa branch December 19, 2020 01:20
@ghost ghost locked as resolved and limited conversation to collaborators Jan 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-mac-os-x macOS aka OSX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support HFA(float) stack passing.
3 participants