-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC's VXSort question #64164
Comments
Tagging subscribers to this area: @tommcdon Issue DetailsI noticed that GC uses VXSort (AVX2/AVX512) but only on Windows-x64. So I assume it has to be enabled for Linux-x64 and rewritten to NEON for Arm64? I only tested it on Plaintext-MVC benchmark (allocates a lot of short-living objects) on our perf-lab and it seems like it VXSort regresses P95 across 7 runs and has no effect on RPS. Also, it adds 227Kb to native size (for coreclr.dll+clrgc.dll combined) Is there a scenario I can simulate on our perflab to see benefits from it or it only targets real world large workloads?
|
Tagging subscribers to this area: @dotnet/gc Issue DetailsI noticed that GC uses VXSort (AVX2/AVX512) but only on Windows-x64. So I assume it has to be enabled for Linux-x64 and rewritten to NEON for Arm64? I only tested it on Plaintext-MVC benchmark (allocates a lot of short-living objects) on our perf-lab and it seems like it VXSort regresses P95 across 7 runs and has no effect on RPS. Also, it adds 227Kb to native size (for coreclr.dll+clrgc.dll combined) Is there a scenario I can simulate on our perflab to see benefits from it or it only targets real world large workloads?
|
Linking #37159 |
Some notes about |
thanks @EgorBo for the data. that's interesting because if you are just allocating some temp objects you shouldn't even hit the vectorized sorting code path. if you took a trace with cpu samples, do you see any samples captured in |
I don't think this needs to be 7.0 but let me know if you don't agree. |
I noticed that GC uses VXSort (AVX2/AVX512) but only on Windows-x64. So I assume it has to be enabled for Linux-x64 and rewritten to NEON for Arm64?
![image](https://user-images.githubusercontent.com/523221/150682770-e1517d32-060a-4478-b1b2-f8726e5fb8a8.png)
(a screenshot, because it's not possible to reference lines in gc.cpp via github 😄)
I only tested it on Plaintext-MVC benchmark (allocates a lot of short-living objects) on our perf-lab and it seems like VXSort regresses P90 across 7 runs and has no effect on RPS. Also, it adds 227Kb to native size (for coreclr.dll+clrgc.dll combined)
![image](https://user-images.githubusercontent.com/523221/150685512-5ced8bf9-e4be-4eb7-b866-bc913698b2e7.png)
Is there a scenario I can simulate on our perflab to see benefits from it or it only targets real world large workloads?
I am asking because I am wondering if it worth porting to NEON SIMD for arm64.
The text was updated successfully, but these errors were encountered: