-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector.Sum(Vector<T>) API implementation for horizontal add. #53527
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @tannergooding, @pgovind Issue DetailsThis PR adds implementation for the proposed At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it. Closes #35626 //cc @tannergooding @Sergio0694
|
I think requiring SSSE3 is fine, this will be basically any CPU from 2006 onwards, and at least according to the Steam Hardware Survey is 99.17% of all CPUs. I imagine this could be even higher for enterprise and cloud scenarios. |
@tannergooding for some reason one of the tests fails on ARM64 for Assert failure(PID 93 [0x0000005d], Thread: 112 [0x0070]): Assertion failed 'isVectorRegister(reg1)' in 'System.Numerics.Tests.GenericVectorTests:TestSum(System.Func`2[[System.UInt32[], System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.UInt32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]])' during 'Generate code' (IL size 52)
File: /__w/1/s/src/coreclr/jit/emitarm64.cpp Line: 4387 I am unfamiliar with the assertions there. Can you provide assistance to figure out what is wrong, please? |
Thanks again for all the work here, this is looking like its near completion with the last couple comments and should hopefully pass CI here soon <3 |
FYI, the auto formatting settings and using Visual Studio constantly change a few of the lines that aren't related to this PR and are in conflict with CI checks. See this undo commit for example 1b484d7. Is something I'm doing wrong? I have muscle memory for "Format Document" and use it from time to time to make sure the formatting is correct. Should I not use it when working here, and instead rely on "Format Section"? How do you work here and make sure the code is formatted accordingly to style rules? |
This sounds like a disconnect between the clangformat settings and what VS is deciding. An issue should probably get logged so this can be fixed. To setup things, I'd do:
Then, any time you need to format you can simply run: |
CC. @dotnet/jit-contrib, @echesakovMSFT; this community PR should be ready to review now. |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm |
Azure Pipelines successfully started running 2 pipeline(s). |
Ping @dotnet/jit-contrib, @echesakovMSFT. Community PR should be ready for review. |
@tannergooding Thanks for the ping. I will take a look tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your change @zlatanov!
I have some suggestions how to improve usage of arm64 intrinsics
} | ||
case TYP_LONG: | ||
case TYP_ULONG: | ||
case TYP_FLOAT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TYP_LONG
TYP_ULONG
and TYP_DOUBLE
we should avoid cloning op1
by emitting AddPairwiseScalar
intrinsics instead (these correspond to addp
faddp
(scalar) see https://developer.arm.com/architectures/instruction-sets/intrinsics/vpaddd_f64) that operates on a single 16-byte vector.
Hello @tannergooding! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
@tannergooding I don't think the pipeline failures are related to this PR. Edit: Actually, some of the failures do seem to be related here. See https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-53527-merge-2c257229936a4cea87/System.Numerics.Vectors.Tests/console.eb12607d.log?sv=2019-07-07&se=2021-07-01T08%3A05%3A12Z&sr=c&sp=rl&sig=vtS0xe9l4%2FFLeluPICLMiAigJBqjabpwigm46IuCIm0%3D I don't see what could ne causing |
Given it was on OSX and passed on rerun, it might be an issue with hardware that has AVX but not AVX2. However, I don't see anything that would be causing it in this PR. I'll keep an eye on CI for additional flakiness and see if I can repro locally as well. |
This PR adds implementation for the proposed
Vector.Sum(Vector<T>)
API.At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.
Closes #35626
//cc @tannergooding @Sergio0694