Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

zlatanov · 2021-06-01T13:30:43Z

This PR adds implementation for the proposed Vector.Sum(Vector<T>) API.

At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.

Closes #35626

//cc @tannergooding @Sergio0694

dotnet-issue-labeler · 2021-06-01T13:30:47Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2021-06-01T13:30:49Z

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds implementation for the proposed Vector.Sum(Vector<T>) API.

At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.

Closes #35626

//cc @tannergooding @Sergio0694

Author:	zlatanov
Assignees:	-
Labels:	`area-System.Numerics`, `new-api-needs-documentation`
Milestone:	-

src/coreclr/jit/simdashwintrinsic.cpp

tannergooding · 2021-06-01T15:15:12Z

At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.

I think requiring SSSE3 is fine, this will be basically any CPU from 2006 onwards, and at least according to the Steam Hardware Survey is 99.17% of all CPUs. I imagine this could be even higher for enterprise and cloud scenarios.

src/coreclr/jit/simdashwintrinsiclistarm64.h

… ulong, float, double on ARM64.

zlatanov · 2021-06-02T11:51:26Z

@tannergooding for some reason one of the tests fails on ARM64 for uint.

Assert failure(PID 93 [0x0000005d], Thread: 112 [0x0070]): Assertion failed 'isVectorRegister(reg1)' in 'System.Numerics.Tests.GenericVectorTests:TestSum(System.Func`2[[System.UInt32[], System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.UInt32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]])' during 'Generate code' (IL size 52)

File: /__w/1/s/src/coreclr/jit/emitarm64.cpp Line: 4387

I am unfamiliar with the assertions there. Can you provide assistance to figure out what is wrong, please?

src/coreclr/jit/simdashwintrinsiclistarm64.h

src/coreclr/jit/simdashwintrinsic.cpp

tannergooding · 2021-06-03T14:38:18Z

Thanks again for all the work here, this is looking like its near completion with the last couple comments and should hopefully pass CI here soon <3

…_ToScalar.

zlatanov · 2021-06-03T14:53:41Z

FYI, the auto formatting settings and using Visual Studio constantly change a few of the lines that aren't related to this PR and are in conflict with CI checks. See this undo commit for example 1b484d7.

Is something I'm doing wrong? I have muscle memory for "Format Document" and use it from time to time to make sure the formatting is correct. Should I not use it when working here, and instead rely on "Format Section"? How do you work here and make sure the code is formatted accordingly to style rules?

tannergooding · 2021-06-03T15:18:35Z

I have muscle memory for "Format Document" and use it from time to time to make sure the formatting is correct

This sounds like a disconnect between the clangformat settings and what VS is deciding. An issue should probably get logged so this can be fixed.

To setup things, I'd do:

Clone https://github.com/dotnet/jitutils
Build using bootstrap.cmd
Add <jitutils-repo-root>/bin to your path

Then, any time you need to format you can simply run: jit-format -c <runtime-repo-root>\src\coreclr -f

tannergooding · 2021-06-03T18:43:30Z

CC. @dotnet/jit-contrib, @echesakovMSFT; this community PR should be ready to review now.

tannergooding · 2021-06-03T18:43:55Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm

azure-pipelines · 2021-06-03T18:44:19Z

Azure Pipelines successfully started running 2 pipeline(s).

tannergooding · 2021-06-08T22:07:23Z

Ping @dotnet/jit-contrib, @echesakovMSFT. Community PR should be ready for review.

echesakov · 2021-06-09T00:14:44Z

@tannergooding Thanks for the ping. I will take a look tomorrow.

echesakov

Thank you for your change @zlatanov!

I have some suggestions how to improve usage of arm64 intrinsics

src/coreclr/jit/simdashwintrinsic.cpp

echesakov · 2021-06-10T00:44:39Z

src/coreclr/jit/simdashwintrinsic.cpp

+                        }
+                        case TYP_LONG:
+                        case TYP_ULONG:
+                        case TYP_FLOAT:


For TYP_LONG TYP_ULONG and TYP_DOUBLE we should avoid cloning op1 by emitting AddPairwiseScalar intrinsics instead (these correspond to addp faddp (scalar) see https://developer.arm.com/architectures/instruction-sets/intrinsics/vpaddd_f64) that operates on a single 16-byte vector.

…128_Sum.

src/coreclr/jit/simdashwintrinsic.cpp

ghost · 2021-06-10T16:35:35Z

Hello @tannergooding!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

zlatanov · 2021-06-11T08:44:47Z

@tannergooding I don't think the pipeline failures are related to this PR.

Edit:

Actually, some of the failures do seem to be related here. See https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-53527-merge-2c257229936a4cea87/System.Numerics.Vectors.Tests/console.eb12607d.log?sv=2019-07-07&se=2021-07-01T08%3A05%3A12Z&sr=c&sp=rl&sig=vtS0xe9l4%2FFLeluPICLMiAigJBqjabpwigm46IuCIm0%3D

I don't see what could ne causing NullReferenceException in the tests though.

tannergooding · 2021-06-11T16:50:04Z

I don't see what could ne causing NullReferenceException in the tests though.

Given it was on OSX and passed on rerun, it might be an issue with hardware that has AVX but not AVX2. However, I don't see anything that would be causing it in this PR.

I'll keep an eye on CI for additional flakiness and see if I can repro locally as well.

Vector.Sum(Vector<T>) API implementation for horizontal add.

800418f

dotnet-issue-labeler bot added area-System.Numerics new-api-needs-documentation labels Jun 1, 2021

Fixed inccorrect referece to Arm64 AddAccross intrinsic function.

ee8eeea

SingleAccretion reviewed Jun 1, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsic.cpp Show resolved Hide resolved

tannergooding reviewed Jun 1, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsiclistarm64.h Outdated Show resolved Hide resolved

zlatanov added 3 commits June 2, 2021 11:37

Added implementation for hardware accelerated Vector<T>.Sum for long,…

7be5f33

… ulong, float, double on ARM64.

Merge branch 'dotnet:main' into vector-sum

61a3e7b

Fixed formatting issue.

5dd13f6

tannergooding reviewed Jun 2, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsiclistarm64.h Outdated Show resolved Hide resolved

tannergooding reviewed Jun 2, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsic.cpp Show resolved Hide resolved

zlatanov added 3 commits June 3, 2021 12:42

Correctness.

8ed3936

Fixed compiler error for ARM64.

2e67c3b

Formatting issue.

1b484d7

tannergooding reviewed Jun 3, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsic.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Jun 3, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsic.cpp Show resolved Hide resolved

zlatanov added 2 commits June 3, 2021 17:43

More explicit switch statement. Fixed wrong simd size for NI_Vector64…

913c44d

…_ToScalar.

Fixed auto formatting issue.

5142923

tannergooding approved these changes Jun 3, 2021

View reviewed changes

echesakov self-requested a review June 9, 2021 00:14

echesakov suggested changes Jun 10, 2021

View reviewed changes

Use AddPairwiseScalar for double, long and ulong on ARM64 for VectorT…

10678f6

…128_Sum.

zlatanov requested a review from echesakov June 10, 2021 16:04

echesakov reviewed Jun 10, 2021

View reviewed changes

src/coreclr/jit/simdashwintrinsic.cpp Outdated Show resolved Hide resolved

Forgot ToScalar call after AddPairwiseScalar.

1db0605

echesakov approved these changes Jun 10, 2021

View reviewed changes

tannergooding added the auto-merge label Jun 10, 2021

Fixed wrong return type.

11f2483

ghost removed the auto-merge label Jun 10, 2021

Merge branch 'dotnet:main' into vector-sum

f9821bb

tannergooding merged commit 6afe03e into dotnet:main Jun 11, 2021

tannergooding mentioned this pull request Jun 13, 2021

Ensure Vector.Sum uses SSE3, rather than SSSE3, for floating-point #54123

Merged

zlatanov deleted the vector-sum branch July 2, 2021 14:55

ghost locked as resolved and limited conversation to collaborators Aug 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

zlatanov commented Jun 1, 2021

dotnet-issue-labeler bot commented Jun 1, 2021

ghost commented Jun 1, 2021

tannergooding commented Jun 1, 2021

zlatanov commented Jun 2, 2021

tannergooding commented Jun 3, 2021

zlatanov commented Jun 3, 2021

tannergooding commented Jun 3, 2021

tannergooding commented Jun 3, 2021

tannergooding commented Jun 3, 2021

azure-pipelines bot commented Jun 3, 2021

tannergooding commented Jun 8, 2021

echesakov commented Jun 9, 2021

echesakov left a comment

echesakov Jun 10, 2021

ghost commented Jun 10, 2021

zlatanov commented Jun 11, 2021 •

edited

Loading

tannergooding commented Jun 11, 2021

Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

Conversation

zlatanov commented Jun 1, 2021

dotnet-issue-labeler bot commented Jun 1, 2021

ghost commented Jun 1, 2021

tannergooding commented Jun 1, 2021

zlatanov commented Jun 2, 2021

tannergooding commented Jun 3, 2021

zlatanov commented Jun 3, 2021

tannergooding commented Jun 3, 2021

tannergooding commented Jun 3, 2021

tannergooding commented Jun 3, 2021

azure-pipelines bot commented Jun 3, 2021

tannergooding commented Jun 8, 2021

echesakov commented Jun 9, 2021

echesakov left a comment

Choose a reason for hiding this comment

echesakov Jun 10, 2021

Choose a reason for hiding this comment

ghost commented Jun 10, 2021

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

zlatanov commented Jun 11, 2021 • edited Loading

tannergooding commented Jun 11, 2021

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

zlatanov commented Jun 11, 2021 •

edited

Loading