Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress on porting ML.NET native SIMD algorithms to managed code #1

Open
45 of 46 tasks
briancylui opened this issue Jul 3, 2018 · 1 comment
Open
45 of 46 tasks
Assignees

Comments

@briancylui
Copy link
Owner

briancylui commented Jul 3, 2018

Goals

  1. Gain an understanding of SIMD operations and use cases
  2. Port ML.NET C++ SIMD algorithms to C#
  3. Increase ML.NET performance by using AVX operations when supported and where beneficial
  4. Ensure C# Hardware Intrinsics feature meets the needs of ML.NET
  5. Unit test all functions and get performance benchmark numbers for before and after changes
  6. (Stretch) provide software fallback implementations to support more architectures
  7. (Stretch) Implement ARM64 SIMD algorithms

Progress

Week 1: Familiarize with .NET Development

Week 2: Learn SIMD operations and use them in .NET outside of ML.NET

  • Complete first connect with recruiter
  • Implement SSE support and software fallbacks in managed code for all key intrinsics
  • Comply with coding style standard
  • Implement working unit tests for all key intrinsics
  • Implement working performance tests for all key intrinsics using BenchmarkDotNet (slides and recording)
  • Present performance results in a table (SsePerf-report-github.pdf)
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.15063.1155 (1703/CreatorsUpdate/Redstone2)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
Frequency=3515623 Hz, Resolution=284.4446 ns, Timer=TSC
.NET Core SDK=2.1.300
  [Host]     : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT

Method Mean Error StdDev
NativeDotUPerf 363.2 us 7.7293 us 18.8143 us
MyDotUPerf 340.2 us 6.7218 us 8.0018 us
NativeDotSUPerf 2,178.3 us 43.4641 us 40.6563 us
MyDotSUPerf 2,144.7 us 19.1638 us 16.0027 us
NativeSumSqUPerf 540.6 us 3.0299 us 2.8342 us
MySumSqUPerf 538.8 us 2.5507 us 2.3859 us
NativeAddUPerf 313.9 us 2.5163 us 2.3537 us
MyAddUPerf 303.3 us 4.5125 us 4.2210 us
NativeAddSUPerf 2,691.8 us 29.4588 us 27.5558 us
MyAddSUPerf 2,658.1 us 51.3336 us 64.9206 us
NativeAddScaleUPerf 300.0 us 5.5529 us 5.1941 us
MyAddScaleUPerf 309.8 us 5.3974 us 4.7846 us
NativeAddScaleSUPerf 2,550.9 us 21.8322 us 20.4218 us
MyAddScaleSUPerf 2,805.3 us 20.5171 us 19.1917 us
NativeScaleUPerf 131.4 us 0.6347 us 0.5626 us
MyScaleUPerf 130.7 us 1.2159 us 1.1373 us
NativeDist2Perf 336.4 us 2.0555 us 1.9227 us
MyDist2Perf 335.2 us 8.3427 us 11.4196 us
NativeSumAbsUPerf 258.0 us 1.6470 us 1.5406 us
MySumAbsqUPerf 258.9 us 0.9447 us 0.7889 us
NativeMulElementWiseUPerf 466.4 us 1.9625 us 1.6388 us
MyMulElementWiseUPerf 467.2 us 4.3560 us 4.0747 us

Week 3-5: Port algo to C#, write unit tests and performance tests, check in code

  • Think about why managed codes for "sparse" intrinsics are slower than native codes
  • Apply real data to test implemented managed code using BenchmarkDotNet
  • Schedule a meeting for midpoint review with Dan, Eric, Santi, Tanner, and Ivan on Skype at the end of Week 5 on July 20
  • Get familiarized with the entire pipeline of ML.NET by creating a ML project
  • Integrate local code into ML.NET repo to prepare for checking in code, including:
  • C# implementations of intrinsics
  • Unit tests
  • Performance tests

Week 6

  • Participate in Microsoft Hackathon
  • Attend IEEE conference

Week 7

  • Respond to PR comments and Intel partners
  • Fix build issues in multi-targeting and disabling netcoreapp3.0 test projects
  • Hard-code unit tests
  • Introduced a custom random seed in perf tests based on environmental variables for better testing
  • Major style changes to best utilize existing libraries and ensure aggressive inlining wherever needed
  • Document follow-up action items for performance enhancement in an issue page (Suggestions on CpuMath enhancement #2)
  • Fix perf issues of some SSE intrinsics in compliance with C# 7.3 updates
  • Fix merge conflicts and obtain green builds for PR
  • PR on SSE key intrinsics, as well as their unit tests and perf tests, with multi-targeting, is approved

Week 8-9

  • Scale up implementation, unit tests, and performance tests to cover all SSE intrinsics
  • Write AVX implementations
  • Performance test before and after. We should see some perf gains here.
  • Check in code to ML.NET (submitted PR)

Perf test results for all active SSE hardware intrinsics:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain
Method Mean Error StdDev Median
NativeAddScalarUPerf 221.7 us 4.323 us 5.467 us 220.8 us
ManagedAddScalarUPerf 217.3 us 4.207 us 3.729 us 215.5 us
NativeScaleUPerf 219.0 us 2.368 us 2.215 us 218.9 us
ManagedScaleUPerf 182.2 us 2.677 us 2.504 us 182.4 us
NativeScaleSrcUPerf 252.4 us 4.404 us 3.904 us 250.8 us
ManagedScaleSrcUPerf 271.5 us 5.357 us 6.377 us 272.0 us
NativeScaleAddUPerf 230.6 us 3.230 us 3.021 us 230.5 us
ManagedScaleAddUPerf 232.3 us 3.281 us 2.908 us 231.8 us
NativeAddScaleUPerf 317.5 us 4.360 us 4.079 us 316.0 us
ManagedAddScaleUPerf 317.1 us 4.778 us 3.990 us 317.5 us
NativeAddScaleSUPerf 4,135.9 us 66.596 us 62.294 us 4,126.9 us
ManagedAddScaleSUPerf 4,812.6 us 39.148 us 34.704 us 4,803.0 us
NativeAddScaleCopyUPerf 505.4 us 5.658 us 4.725 us 503.8 us
ManagedAddScaleCopyUPerf 481.7 us 9.140 us 8.550 us 480.0 us
NativeAddUPerf 316.5 us 5.698 us 5.330 us 314.7 us
ManagedAddUPerf 335.2 us 12.130 us 23.944 us 321.9 us
NativeAddSUPerf 4,249.0 us 58.001 us 54.255 us 4,254.0 us
ManagedAddSUPerf 4,583.9 us 78.739 us 73.652 us 4,556.6 us
NativeMulElementWiseUPerf 552.5 us 7.078 us 5.911 us 551.5 us
ManagedMulElementWiseUPerf 507.9 us 7.059 us 6.258 us 507.8 us
NativeSumUPerf 289.2 us 5.435 us 5.084 us 287.6 us
ManagedSumUPerf 288.3 us 2.815 us 2.350 us 287.8 us
NativeSumSqUPerf 283.2 us 1.572 us 1.393 us 283.3 us
ManagedSumSqUPerf 289.8 us 2.493 us 2.210 us 288.8 us
NativeSumSqDiffUPerf 289.4 us 3.621 us 3.387 us 289.4 us
ManagedSumSqDiffUPerf 290.9 us 2.772 us 2.593 us 290.0 us
NativeSumAbsUPerf 289.2 us 4.836 us 4.524 us 287.0 us
ManagedSumAbsUPerf 293.1 us 1.338 us 1.186 us 293.2 us
NativeSumAbsDiffUPerf 290.7 us 5.000 us 4.677 us 288.8 us
ManagedSumAbsDiffUPerf 294.4 us 5.242 us 4.903 us 293.0 us
NativeMaxAbsUPerf 288.0 us 3.924 us 3.671 us 285.8 us
ManagedMaxAbsUPerf 290.1 us 2.614 us 2.317 us 289.0 us
NativeMaxAbsDiffUPerf 292.1 us 4.805 us 4.495 us 289.6 us
ManagedMaxAbsDiffUPerf 290.6 us 2.083 us 1.846 us 290.3 us
NativeDotUPerf 328.8 us 3.844 us 3.407 us 328.6 us
ManagedDotUPerf 333.8 us 2.154 us 1.910 us 333.3 us
NativeDotSUPerf 3,414.2 us 67.058 us 68.864 us 3,393.7 us
ManagedDotSUPerf 3,753.1 us 37.440 us 33.189 us 3,737.5 us
NativeDist2Perf 332.3 us 3.152 us 2.632 us 332.0 us
ManagedDist2Perf 333.7 us 4.368 us 3.647 us 332.0 us
NativeSdcaL1UpdateUPerf 607.5 us 8.506 us 7.957 us 608.7 us
ManagedSdcaL1UpdateUPerf 600.8 us 12.003 us 27.820 us 591.3 us
NativeSdcaL1UpdateSUPerf 13,445.5 us 116.336 us 108.821 us 13,447.1 us
ManagedSdcaL1UpdateSUPerf 13,824.3 us 97.564 us 86.488 us 13,795.3 us

Perf tests results for all managed intrinsics with AVX enhancement:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain
Method Mean Error StdDev
ManagedAddScalarUPerf 157.3 us 1.3138 us 1.1647 us
ManagedScaleUPerf 177.0 us 3.5143 us 7.5649 us
ManagedScaleSrcUPerf 260.5 us 0.9317 us 0.8715 us
ManagedScaleAddUPerf 170.3 us 1.6569 us 1.5499 us
ManagedAddScaleUPerf 272.5 us 5.4200 us 9.2035 us
ManagedAddScaleSUPerf 5,253.6 us 105.0419 us 163.5375 us
ManagedAddScaleCopyUPerf 448.2 us 11.0005 us 19.8362 us
ManagedAddUPerf 263.4 us 2.5347 us 2.2469 us
ManagedAddSUPerf 4,256.5 us 38.0944 us 33.7697 us
ManagedMulElementWiseUPerf 441.7 us 3.2423 us 2.8742 us
ManagedSumUPerf 161.0 us 1.3688 us 1.2134 us
ManagedSumSqUPerf 165.0 us 0.4772 us 0.4230 us
ManagedSumSqDiffUPerf 179.5 us 1.1673 us 1.0919 us
ManagedSumAbsUPerf 174.9 us 3.4667 us 5.9799 us
ManagedSumAbsDiffUPerf 178.7 us 0.6264 us 0.4529 us
ManagedMaxAbsUPerf 168.2 us 1.1892 us 1.0542 us
ManagedMaxAbsDiffUPerf 179.7 us 1.9884 us 1.7626 us
ManagedDotUPerf 258.1 us 2.6630 us 2.2237 us
ManagedDotSUPerf 3,297.7 us 23.2337 us 19.4012 us
ManagedDist2Perf 258.8 us 3.9883 us 3.5355 us
ManagedSdcaL1UpdateUPerf 545.0 us 10.7959 us 17.1234 us
ManagedSdcaL1UpdateSUPerf 13,624.1 us 34.6645 us 32.4252 us

In one summary:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain
Type Method Mean Error StdDev Median
AvxPerformanceTests AddScalarU 192.3 us 3.835 us 5.2489 us 192.1 us
NativePerformanceTests AddScalarU 225.9 us 4.407 us 6.7300 us 225.0 us
SsePerformanceTests AddScalarU 240.7 us 5.306 us 15.3944 us 237.7 us
AvxPerformanceTests ScaleU 163.9 us 2.477 us 2.0687 us 163.7 us
NativePerformanceTests ScaleU 188.9 us 2.688 us 2.2447 us 189.3 us
SsePerformanceTests ScaleU 234.1 us 6.896 us 20.3319 us 234.2 us
AvxPerformanceTests ScaleSrcU 281.5 us 4.158 us 3.6856 us 280.5 us
NativePerformanceTests ScaleSrcU 298.0 us 7.632 us 21.8963 us 292.2 us
SsePerformanceTests ScaleSrcU 271.6 us 5.157 us 5.0645 us 271.0 us
AvxPerformanceTests ScaleAddU 182.6 us 3.654 us 3.2395 us 181.7 us
NativePerformanceTests ScaleAddU 231.1 us 3.641 us 3.2279 us 230.9 us
SsePerformanceTests ScaleAddU 210.4 us 7.888 us 23.1345 us 198.2 us
AvxPerformanceTests AddScaleU 295.9 us 5.907 us 15.5625 us 296.1 us
NativePerformanceTests AddScaleU 336.4 us 5.054 us 4.7274 us 336.4 us
SsePerformanceTests AddScaleU 330.2 us 7.823 us 10.7077 us 328.1 us
AvxPerformanceTests AddScaleSU 4,603.2 us 113.641 us 326.0574 us 4,494.6 us
NativePerformanceTests AddScaleSU 3,985.1 us 54.772 us 45.7368 us 3,982.6 us
SsePerformanceTests AddScaleSU 4,441.8 us 83.317 us 77.9344 us 4,416.8 us
AvxPerformanceTests AddScaleCopyU 534.5 us 10.504 us 23.2753 us 531.9 us
NativePerformanceTests AddScaleCopyU 548.6 us 10.743 us 15.0600 us 543.8 us
SsePerformanceTests AddScaleCopyU 504.5 us 9.430 us 9.2616 us 505.9 us
AvxPerformanceTests AddU 272.2 us 5.391 us 12.7072 us 271.3 us
NativePerformanceTests AddU 331.7 us 6.306 us 6.7473 us 333.0 us
SsePerformanceTests AddU 283.4 us 5.639 us 11.2608 us 278.0 us
AvxPerformanceTests AddSU 4,482.2 us 90.556 us 200.6652 us 4,408.2 us
NativePerformanceTests AddSU 4,132.2 us 81.246 us 113.8950 us 4,109.6 us
SsePerformanceTests AddSU 4,164.2 us 82.393 us 88.1599 us 4,144.5 us
AvxPerformanceTests MulElementWiseU 470.3 us 8.353 us 7.4044 us 467.7 us
NativePerformanceTests MulElementWiseU 465.5 us 8.192 us 6.8406 us 465.1 us
SsePerformanceTests MulElementWiseU 392.9 us 7.107 us 6.6481 us 390.3 us
AvxPerformanceTests SumU 154.2 us 2.413 us 2.2572 us 153.6 us
NativePerformanceTests SumU 283.2 us 3.950 us 3.6952 us 282.1 us
SsePerformanceTests SumU 271.7 us 2.715 us 2.5394 us 271.3 us
AvxPerformanceTests SumSqU 180.7 us 3.583 us 8.1606 us 180.7 us
NativePerformanceTests SumSqU 282.3 us 5.702 us 5.6003 us 280.8 us
SsePerformanceTests SumSqU 270.2 us 1.125 us 0.9397 us 270.0 us
AvxPerformanceTests SumSqDiffU 165.9 us 2.453 us 2.1745 us 166.0 us
NativePerformanceTests SumSqDiffU 287.9 us 3.850 us 3.6011 us 288.0 us
SsePerformanceTests SumSqDiffU 276.2 us 5.080 us 4.7515 us 273.5 us
AvxPerformanceTests SumAbsU 160.1 us 3.095 us 3.0401 us 159.8 us
NativePerformanceTests SumAbsU 289.0 us 5.743 us 6.6134 us 286.2 us
SsePerformanceTests SumAbsU 278.2 us 1.676 us 1.3994 us 278.3 us
AvxPerformanceTests SumAbsDiffU 163.8 us 1.891 us 1.5792 us 163.8 us
NativePerformanceTests SumAbsDiffU 288.5 us 5.688 us 5.3210 us 288.7 us
SsePerformanceTests SumAbsDiffU 278.6 us 4.304 us 4.0259 us 277.7 us
AvxPerformanceTests MaxAbsU 157.9 us 2.158 us 2.0189 us 157.7 us
NativePerformanceTests MaxAbsU 281.5 us 2.903 us 2.5732 us 281.9 us
SsePerformanceTests MaxAbsU 278.0 us 2.890 us 2.7033 us 277.3 us
AvxPerformanceTests MaxAbsDiffU 168.7 us 2.555 us 2.3895 us 168.2 us
NativePerformanceTests MaxAbsDiffU 285.9 us 5.610 us 5.5096 us 283.7 us
SsePerformanceTests MaxAbsDiffU 276.0 us 3.051 us 2.7046 us 274.7 us
AvxPerformanceTests DotU 229.6 us 4.586 us 4.2898 us 228.6 us
NativePerformanceTests DotU 314.1 us 5.461 us 4.8413 us 313.5 us
SsePerformanceTests DotU 295.9 us 4.912 us 4.5950 us 293.9 us
AvxPerformanceTests DotSU 3,302.5 us 49.913 us 44.2461 us 3,294.7 us
NativePerformanceTests DotSU 3,741.2 us 112.502 us 178.4404 us 3,720.5 us
SsePerformanceTests DotSU 3,492.2 us 56.641 us 47.2981 us 3,485.0 us
AvxPerformanceTests Dist2 234.0 us 4.405 us 3.9045 us 233.6 us
NativePerformanceTests Dist2 319.0 us 6.373 us 7.0833 us 319.7 us
SsePerformanceTests Dist2 299.2 us 5.823 us 5.1618 us 298.7 us
AvxPerformanceTests SdcaL1UpdateU 604.1 us 11.995 us 35.3680 us 593.6 us
NativePerformanceTests SdcaL1UpdateU 664.3 us 12.715 us 12.4873 us 661.9 us
SsePerformanceTests SdcaL1UpdateU 593.3 us 11.658 us 16.3430 us 594.1 us
AvxPerformanceTests SdcaL1UpdateSU 12,363.5 us 161.361 us 143.0421 us 12,339.6 us
NativePerformanceTests SdcaL1UpdateSU 12,678.7 us 202.557 us 179.5616 us 12,661.0 us
SsePerformanceTests SdcaL1UpdateSU 11,670.2 us 122.880 us 108.9298 us 11,645.9 us

Week 10-11 (Stretch)

  • Provide software fallback implementations (stretch goals)
  • Respond to PR feedback for AVX intrinsics
  • Streamlined perf test layout
  • Report improvement in running time of intrinsics: averaged 17.78%
  • Report improvement in running time of end-to-end real-life user scenarios: 13.88%
  • Get ML.NET to run on Raspberry Pi
  • Present on August 31 (11am-12nn 25/3365, also on Skype)

Week 12

  • Improve perf by optimizing loops and alignment issues (Suggestions on CpuMath enhancement #2) at the assembly/instruction level
  • Clean up, presentation, close out remaining issues
  • Write blog post on how ML.NET is taking advantage of .NET Core hardware intrinsics, and AVX vs SSE comparisons (both implementation and runtime perf)

Latest perf results:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain
Type Method Mean Error StdDev Median
AvxPerformanceTests AddScalarU 157.3 us 2.680 us 2.376 us 157.3 us
NativePerformanceTests AddScalarU 186.7 us 3.253 us 3.043 us 185.7 us
SsePerformanceTests AddScalarU 184.0 us 3.382 us 2.824 us 183.5 us
AvxPerformanceTests ScaleU 157.5 us 1.754 us 1.465 us 157.3 us
NativePerformanceTests ScaleU 174.9 us 3.437 us 3.529 us 173.8 us
SsePerformanceTests ScaleU 184.4 us 3.158 us 2.799 us 184.2 us
AvxPerformanceTests ScaleSrcU 271.6 us 4.723 us 3.944 us 270.1 us
NativePerformanceTests ScaleSrcU 281.0 us 3.579 us 3.173 us 280.7 us
SsePerformanceTests ScaleSrcU 284.6 us 4.786 us 4.242 us 283.6 us
AvxPerformanceTests ScaleAddU 181.4 us 2.791 us 2.610 us 181.6 us
NativePerformanceTests ScaleAddU 192.1 us 2.769 us 2.312 us 191.6 us
SsePerformanceTests ScaleAddU 189.6 us 2.190 us 1.829 us 189.4 us
AvxPerformanceTests AddScaleU 284.1 us 6.002 us 5.615 us 282.2 us
NativePerformanceTests AddScaleU 327.1 us 5.215 us 4.623 us 326.5 us
SsePerformanceTests AddScaleU 321.2 us 3.093 us 2.742 us 321.0 us
AvxPerformanceTests AddScaleSU 4,630.5 us 58.590 us 51.939 us 4,619.5 us
NativePerformanceTests AddScaleSU 3,910.6 us 43.011 us 40.233 us 3,910.2 us
SsePerformanceTests AddScaleSU 4,487.7 us 88.687 us 82.958 us 4,489.2 us
AvxPerformanceTests AddScaleCopyU 465.9 us 9.862 us 9.225 us 463.1 us
NativePerformanceTests AddScaleCopyU 493.9 us 5.991 us 5.604 us 494.0 us
SsePerformanceTests AddScaleCopyU 501.1 us 6.755 us 5.988 us 500.8 us
AvxPerformanceTests AddU 281.8 us 3.346 us 2.794 us 281.4 us
NativePerformanceTests AddU 353.7 us 4.312 us 3.600 us 353.2 us
SsePerformanceTests AddU 351.6 us 2.268 us 1.894 us 352.2 us
AvxPerformanceTests AddSU 4,435.3 us 38.197 us 31.896 us 4,433.4 us
NativePerformanceTests AddSU 4,309.1 us 50.212 us 46.968 us 4,313.8 us
SsePerformanceTests AddSU 4,821.4 us 60.796 us 53.894 us 4,812.6 us
AvxPerformanceTests MulElementWiseU 522.2 us 7.380 us 6.543 us 521.1 us
NativePerformanceTests MulElementWiseU 472.6 us 9.435 us 17.721 us 476.1 us
SsePerformanceTests MulElementWiseU 470.9 us 8.913 us 7.901 us 467.3 us
AvxPerformanceTests SumU 165.3 us 1.332 us 1.180 us 165.0 us
NativePerformanceTests SumU 291.6 us 2.791 us 2.474 us 291.5 us
SsePerformanceTests SumU 288.7 us 1.568 us 1.390 us 288.8 us
AvxPerformanceTests SumSqU 167.8 us 1.376 us 1.220 us 167.9 us
NativePerformanceTests SumSqU 262.7 us 2.607 us 2.439 us 261.9 us
SsePerformanceTests SumSqU 263.3 us 1.857 us 1.646 us 262.9 us
AvxPerformanceTests SumSqDiffU 181.2 us 2.185 us 1.937 us 180.6 us
NativePerformanceTests SumSqDiffU 297.9 us 5.733 us 5.888 us 294.8 us
SsePerformanceTests SumSqDiffU 297.9 us 2.855 us 2.671 us 297.1 us
AvxPerformanceTests SumAbsU 187.8 us 3.503 us 3.277 us 186.7 us
NativePerformanceTests SumAbsU 261.9 us 1.809 us 1.510 us 262.6 us
SsePerformanceTests SumAbsU 274.4 us 1.539 us 1.439 us 274.3 us
AvxPerformanceTests SumAbsDiffU 190.1 us 1.878 us 1.568 us 190.6 us
NativePerformanceTests SumAbsDiffU 294.4 us 2.982 us 2.644 us 293.7 us
SsePerformanceTests SumAbsDiffU 311.4 us 2.179 us 1.931 us 311.0 us
AvxPerformanceTests MaxAbsU 186.8 us 2.503 us 2.219 us 187.6 us
NativePerformanceTests MaxAbsU 263.0 us 2.535 us 2.371 us 262.5 us
SsePerformanceTests MaxAbsU 274.8 us 1.778 us 1.576 us 274.3 us
AvxPerformanceTests MaxAbsDiffU 192.3 us 3.816 us 3.918 us 190.8 us
NativePerformanceTests MaxAbsDiffU 295.9 us 1.960 us 1.737 us 295.7 us
SsePerformanceTests MaxAbsDiffU 311.4 us 2.292 us 2.144 us 311.0 us
AvxPerformanceTests DotU 279.6 us 4.530 us 4.237 us 279.4 us
NativePerformanceTests DotU 358.4 us 7.314 us 16.207 us 351.9 us
SsePerformanceTests DotU 357.9 us 3.730 us 3.306 us 356.8 us
AvxPerformanceTests DotSU 3,374.0 us 43.577 us 38.630 us 3,373.5 us
NativePerformanceTests DotSU 3,443.8 us 49.761 us 46.546 us 3,422.8 us
SsePerformanceTests DotSU 3,959.1 us 60.141 us 56.256 us 3,968.8 us
AvxPerformanceTests Dist2 268.9 us 3.041 us 2.845 us 268.0 us
NativePerformanceTests Dist2 364.2 us 4.073 us 3.401 us 363.7 us
SsePerformanceTests Dist2 359.5 us 4.037 us 3.578 us 359.1 us
AvxPerformanceTests SdcaL1UpdateU 588.4 us 12.117 us 15.756 us 588.0 us
NativePerformanceTests SdcaL1UpdateU 635.4 us 12.245 us 10.855 us 632.8 us
SsePerformanceTests SdcaL1UpdateU 628.8 us 5.655 us 4.722 us 628.7 us
AvxPerformanceTests SdcaL1UpdateSU 13,943.0 us 127.516 us 113.040 us 13,973.4 us
NativePerformanceTests SdcaL1UpdateSU 13,014.6 us 124.704 us 116.649 us 13,024.6 us
SsePerformanceTests SdcaL1UpdateSU 13,957.6 us 55.439 us 49.145 us 13,956.9 us
@briancylui briancylui self-assigned this Jul 3, 2018
@danmoseley
Copy link

Check in code implies build and packaging authoring to multitarget for .NET Standard (which includes .NET Framework and .NET Core 2.1) -- which get cpunative -- and .NET Core 3.0 -- which get your new implementation.s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants