Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Improve struct promotion for 256-bit SIMD fields #19663

Merged
merged 1 commit into from
Aug 29, 2018

Conversation

fiigii
Copy link

@fiigii fiigii commented Aug 24, 2018

This PR improves struct promotion to unwrap more 256-bit SIMD fields, which makes PacketTracer benchmark 31% faster with #19662

Performance data (rendering a 2k image)

Execution time Windows Linux
PacketTracer (class) 1.20s 1.35s
PacketTracer (struct) 0.83s 0.93s
Performance Gains 31% 31%

The data collected on

  • Intel Core i9 7900X (Skylake-X) @ 3.3GHz, HT on, Turbo on, 16GB DDR4 2666MHz
  • Windows 10 and Ubuntu 16.04

VTune characterization (module level)

Windows

image

Linux

image

The most obvious module-level change is the runtime (GC) overhead gets reduced (~33% -> ~11%) and managed code also gets better path-length (code size).

VTune characterization (managed code)

Windows

image

Linux

image

Overall, managed code gets improvement by the better code size, but there still are some inefficient codgen that I will continue to investigate and open other issues to discuss (mainly related to https://github.com/dotnet/coreclr/issues/16619)

VTune characterization (CoreCLR runtime)

Windows

image

Linux

image

@fiigii
Copy link
Author

fiigii commented Aug 24, 2018

@tannergooding
Copy link
Member

Do you have some assembly diffs you can share?

const int MaxOffset = MAX_NumOfFieldsInPromotableStruct * XMM_REGSIZE_BYTES;
// This will allow promotion of 4 Vector<T> fields on AVX2 or Vector256<T> on AVX,
// or 8 Vector<T>/Vector128<T> fields on SSE2.
const int MaxOffset = MAX_NumOfFieldsInPromotableStruct * YMM_REGSIZE_BYTES;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this impact machines without AVX support?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not detect any impact from the Vector3 benchmark.

@fiigii
Copy link
Author

fiigii commented Aug 24, 2018

I have run jit-diff on this change, and it shows no any difference in corelib/tests/frameworks.

Although jit-diff uses crossgen that does not work with SIMD code, we can say this change has no impact on the current scalar code base.

I also measured RayTracer (Vector3 benchmark), which has no execution time regression.

@tannergooding
Copy link
Member

Although jit-diff uses crossgen that does not work with SIMD code

Have you also tried to get the pmi diffs? CC. @AndyAyersMS

@fiigii
Copy link
Author

fiigii commented Aug 24, 2018

Have you also tried to get the pmi diffs?

Will try later, but there seems no managed code with more than 4 SIMD16 or 2 SIMD32 struct fields.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CarolEidt
Copy link

@dotnet/jit-contrib - I'd like to have another JIT dev weigh in on this.
@fiigii - just to be conservative I'd like to see the pmi diffs.

@tannergooding
Copy link
Member

tannergooding commented Aug 25, 2018

Will try later, but there seems no managed code with more than 4 SIMD16 or 2 SIMD32 struct fields.

It might be useful/interesting to create a simple 5x4 matrix struct and see what the codegen diff looks like.

Just because CoreFX doesn't have any code that leverages it, doesn't mean other libraries don't (and we don't want to accidentally regress them).

@4creators
Copy link

@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test
@dotnet-bot test Ubuntu arm Cross Checked no_tiered_compilation_innerloop Build and Test

@AndyAyersMS
Copy link
Member

Because we generally only promote structs with primitive typed fields it's hard to get suitably large field offsets for structs with small numbers of fields. Aside from SIMD it would require a fixed field or an explicit layout. And I would guess we don't have very many of these cases floating around in the framework code (otherwise we might have spotted #19149 sooner).

So you should try PMI across the test suite, but even there jit-diffs won't look as broadly as one might hope.

We could also try an SPMI run on desktop I suppose.

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

@CarolEidt @AndyAyersMS @tannergooding I have run pmi diff, no difference

Analyzing diffs...
PMI Diffs for assemblies in D:\workspace\coreclr\bin\tests\Windows_NT.x64.Release for  default jit
Summary:
(Lower is better)
Total bytes of diff: 0 (NaN of base)
0 total files with size differences (0 improved, 0 regressed), 0 unchanged.
0 total methods with size differences (0 improved, 0 regressed), 0 unchanged.
Completed analysis in 2.96s

@tannergooding
Copy link
Member

@fiigii, was this just for CoreCLR or did you also try the PMI diffs for the tests, CoreFX, and various benchmarks we have?

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

Yes, I ran pmi diff on corelib/tests/frameworks/benchmarks (no diff from all of them). How to run jit-diff on CoreFX?

@AndyAyersMS
Copy link
Member

@fiigii can you post the last line of the analysis, showing how many methods were examined, eg something like

3074 total methods with size differences (48 improved, 3026 regressed), 221530 unchanged.

because from the above it looks like things ran too fast and maybe didn't look at any methods at all.

@AndyAyersMS
Copy link
Member

Or maybe you already did? And the number is zero? It should be ~380K.

To run PMI in its most general mode, make sure you've built the tests, and then do something like this (note the -f):

jit-diff diff --pmi --base --base_root ... --diff -f --test_root D:\workspace\coreclr\bin\tests\Windows_NT.x64.Release

The summary should start with something like:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies, assemblies in d:\repos\coreclr\bin\tests\Windows_NT.x64.Release for x64 default jit

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

@AndyAyersMS thanks for the guides, will re-run to make sure.

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

@AndyAyersMS @CarolEidt @tannergooding I re-ran the PMI diff, it showed some difference (improvement). Corssgen diff still has no any diff.
The above PMI diff #19663 (comment) has some build errors, sorry for the mistake.

The new PMI diff result should be correct.

Corelib (no diff):

PS D:\workspace\coreclr-struct> jit-diff diff --diff --base --base_root D:\workspace\coreclr --pmi
Using --output D:\workspace\coreclr-struct\bin\diffs
Using --base D:\workspace\coreclr\bin\Product\Windows_NT.x64.Checked
Using --diff D:\workspace\coreclr-struct\bin\Product\Windows_NT.x64.Checked
Using --arch x64
Using --core_root D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked\Tests\Core_Root

Warning: it is best practice to use a Release build for --core_root, --crossgen, and --test_root.

No assemblies specified; defaulting to corelib
Beginning PMI Diffs for System.Private.CoreLib.dll
\ Finished 1/1 Base 1/1 Diff [154.1 sec]
Completed PMI Diffs for System.Private.CoreLib.dll in 154.11s
Diffs (if any) can be viewed by comparing: D:\workspace\coreclr-struct\bin\diffs\dasmset_4\base D:\workspace\coreclr-str
uct\bin\diffs\dasmset_4\diff
Analyzing diffs...
PMI Diffs for System.Private.CoreLib.dll for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: 0 (0.00% of base)
0 total files with size differences (0 improved, 0 regressed), 1 unchanged.
0 total methods with size differences (0 improved, 0 regressed), 22188 unchanged.
Completed analysis in 1.67s

Tests (improvement):

PS D:\workspace\coreclr-struct> jit-diff diff --diff --base --base_root D:\workspace\coreclr --tests --pmi
Using --output D:\workspace\coreclr-struct\bin\diffs
Using --base D:\workspace\coreclr\bin\Product\Windows_NT.x64.Checked
Using --diff D:\workspace\coreclr-struct\bin\Product\Windows_NT.x64.Checked
Using --arch x64
Using --core_root D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked\Tests\Core_Root
Using --test_root D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked

Warning: it is best practice to use a Release build for --core_root, --crossgen, and --test_root.

Beginning PMI Diffs for assemblies in D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked
\ Finished 3040/3040 Base 3040/3040 Diff [4090.8 sec]
Completed PMI Diffs for assemblies in D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked in 4091.97s
Diffs (if any) can be viewed by comparing: D:\workspace\coreclr-struct\bin\diffs\dasmset_3\base D:\workspace\coreclr-str
uct\bin\diffs\dasmset_3\diff
Analyzing diffs...
Found 57 files with textual diffs.
PMI Diffs for assemblies in D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -16826 (-0.02% of base)
    diff is an improvement.
Top file improvements by size (bytes):
        -314 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualOrderedScalar_ro\CompareEqualOrderedScalar_ro.dasm (-0.15% of
 base)
        -314 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualScalar_ro\CompareEqualScalar_ro.dasm (-0.15% of base)
        -314 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualUnorderedScalar_ro\CompareEqualUnorderedScalar_ro.dasm (-0.15
% of base)
        -314 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrderedScalar_ro\CompareGreaterThanOrderedScalar_ro.das
m (-0.15% of base)
        -314 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrEqualOrderedScalar_ro\CompareGreaterThanOrEqualOrdere
dScalar_ro.dasm (-0.15% of base)
57 total files with size differences (57 improved, 0 regressed), 2983 unchanged.
Top method regressions by size (bytes):
          24 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualOrderedScalar_ro\CompareEqualOrderedScalar_ro.dasm - IntelHar
dwareIntrinsicTest.Program:PrintError(struct,ref,ref,ref) (56 methods)
          24 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualScalar_ro\CompareEqualScalar_ro.dasm - IntelHardwareIntrinsic
Test.Program:PrintError(struct,ref,ref,ref) (56 methods)
          24 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualUnorderedScalar_ro\CompareEqualUnorderedScalar_ro.dasm - Inte
lHardwareIntrinsicTest.Program:PrintError(struct,ref,ref,ref) (56 methods)
          24 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrderedScalar_ro\CompareGreaterThanOrderedScalar_ro.das
m - IntelHardwareIntrinsicTest.Program:PrintError(struct,ref,ref,ref) (56 methods)
          24 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrEqualOrderedScalar_ro\CompareGreaterThanOrEqualOrdere
dScalar_ro.dasm - IntelHardwareIntrinsicTest.Program:PrintError(struct,ref,ref,ref) (56 methods)
Top method improvements by size (bytes):
        -136 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualOrderedScalar_ro\CompareEqualOrderedScalar_ro.dasm - IntelHar
dwareIntrinsicTest.TestTableSse2`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:CheckUnpack(ref
):bool:this (3 methods)
        -136 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualScalar_ro\CompareEqualScalar_ro.dasm - IntelHardwareIntrinsic
Test.TestTableSse2`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:CheckUnpack(ref):bool:this (3
 methods)
        -136 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualUnorderedScalar_ro\CompareEqualUnorderedScalar_ro.dasm - Inte
lHardwareIntrinsicTest.TestTableSse2`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:CheckUnpack
(ref):bool:this (3 methods)
        -136 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrderedScalar_ro\CompareGreaterThanOrderedScalar_ro.das
m - IntelHardwareIntrinsicTest.TestTableSse2`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:Che
ckUnpack(ref):bool:this (3 methods)
        -136 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrEqualOrderedScalar_ro\CompareGreaterThanOrEqualOrdere
dScalar_ro.dasm - IntelHardwareIntrinsicTest.TestTableSse2`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],Sys
tem.Int64]:CheckUnpack(ref):bool:this (3 methods)
388 total methods with size differences (282 improved, 106 regressed), 226916 unchanged.
Completed analysis in 48.04s

Frameworks (improvement only)

PS D:\workspace\coreclr-struct> jit-diff diff --diff --base --base_root D:\workspace\coreclr --pmi -f
Using --output D:\workspace\coreclr-struct\bin\diffs
Using --base D:\workspace\coreclr\bin\Product\Windows_NT.x64.Checked
Using --diff D:\workspace\coreclr-struct\bin\Product\Windows_NT.x64.Checked
Using --arch x64
Using --core_root D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked\Tests\Core_Root

Warning: it is best practice to use a Release build for --core_root, --crossgen, and --test_root.

Beginning PMI Diffs for System.Private.CoreLib.dll, framework assemblies
Warning: can't find framework assembly D:\workspace\coreclr-struct\bin\tests\Windows_NT.x64.Checked\Tests\Core_Root\xuni
t.runner.utility.dotnet.dll
\ Finished 129/129 Base 129/129 Diff [628.0 sec]
Completed PMI Diffs for System.Private.CoreLib.dll, framework assemblies in 628.04s
Diffs (if any) can be viewed by comparing: D:\workspace\coreclr-struct\bin\diffs\dasmset_5\base D:\workspace\coreclr-str
uct\bin\diffs\dasmset_5\diff
Analyzing diffs...
Found 1 files with textual diffs.
PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -59 (0.00% of base)
    diff is an improvement.
Top file improvements by size (bytes):
         -59 : System.Reflection.Metadata.dasm (-0.01% of base)
1 total files with size differences (1 improved, 0 regressed), 128 unchanged.
Top method improvements by size (bytes):
         -24 : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.CustomAttributeDecoder`1[Vector`1][S
ystem.Numerics.Vector`1[System.Single]]:DecodeFixedArgumentType(byref,bool):struct:this
         -19 : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.CustomAttributeDecoder`1[Vector`1][S
ystem.Numerics.Vector`1[System.Single]]:DecodeArrayArgument(byref,struct):struct:this
         -16 : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.CustomAttributeDecoder`1[Vector`1][S
ystem.Numerics.Vector`1[System.Single]]:DecodeNamedArgumentType(byref,bool):struct:this
3 total methods with size differences (3 improved, 0 regressed), 224600 unchanged.
Completed analysis in 13.88s

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

The small regressions in the above test pmi diff are mainly from expanding call CORINFO_HELP_MEMSET to vmovupd sequence.
I think this can be eliminated by https://github.com/dotnet/coreclr/issues/16619 in some senarios.

-mov      rbp, rax
-lea      rcx, bword ptr [rbp+8]
-lea      rdx, bword ptr [rsp+C0H]
-lea      rdx, bword ptr [rsp+C0H]
-mov      r8d, 128
-call     CORINFO_HELP_MEMCPY
+lea      r8, bword ptr [rax+8]
+vmovupd  ymm0, ymmword ptr[rsp+C0H]
+vmovupd  ymmword ptr[r8], ymm0
+vmovupd  ymm0, ymmword ptr[rsp+E0H]
+vmovupd  ymmword ptr[r8+32], ymm0
+vmovupd  ymm0, ymmword ptr[rsp+100H]
+vmovupd  ymmword ptr[r8+64], ymm0
+vmovupd  ymm0, ymmword ptr[rsp+120H]
+vmovupd  ymmword ptr[r8+96], ymm0

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

BTW, PacketTracer benchmark #19662 gets 16.26% code size shrink.

PMI Diffs for PacketTracer.dll for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -6936 (-16.26% of base)
    diff is an improvement.
Top file improvements by size (bytes):
       -6936 : PacketTracer.dasm (-16.26% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements by size (bytes):
       -1775 : PacketTracer.dasm - Packet256Tracer:GetNaturalColor(struct,byref,byref,byref,ref):struct:this
       -1222 : PacketTracer.dasm - Camera:Create(struct,struct):ref
       -1028 : PacketTracer.dasm - Packet256Tracer:Shade(byref,ref,ref,int):struct:this
        -821 : PacketTracer.dasm - Packet256Tracer:GetPoints(struct,struct,ref):struct:this
        -508 : PacketTracer.dasm - SpherePacket256:Intersect(ref):struct:this
27 total methods with size differences (27 improved, 0 regressed), 182 unchanged.

@tannergooding
Copy link
Member

I got the following for CoreCLR:

CoreCLR x64 VEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 1 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 17075 unchanged.
CoreCLR x64 VEX PMI
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 1 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 17677 unchanged.

CoreCLR x64 NoVEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 1 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 17075 unchanged.
CoreCLR x64 NoVEX PMI
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 1 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 17677 unchanged.

@tannergooding
Copy link
Member

tannergooding commented Aug 28, 2018

I got the following for Framework:

Framework x64 VEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 129 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 142056 unchanged.
Framework x64 VEX PMI
	Total bytes of diff: -59 (0.00% of base)
		diff is an improvement.
	Top file improvements by size (bytes):
			 -59 : System.Reflection.Metadata.dasm (-0.01% of base)
	1 total files with size differences (1 improved, 0 regressed), 128 unchanged.
	Top method improvements by size (bytes):
			 -24 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeFixedArgumentType(byref,bool):struct:this (5 methods)
			 -19 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeArrayArgument(byref,struct):struct:this (5 methods)
			 -16 : System.Reflection.Metadata.dasm - CustomAttributeDecoder`1:DecodeNamedArgumentType(byref,bool):struct:this (5 methods)
	3 total methods with size differences (3 improved, 0 regressed), 192869 unchanged.

Framework x64 NoVEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 129 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 142056 unchanged.
Framework x64 NoVEX PMI
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 129 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 192872 unchanged.

@tannergooding
Copy link
Member

tannergooding commented Aug 28, 2018

I got the following for Tests:

Tests x64 VEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 2731 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 151675 unchanged.
Tests x64 VEX PMI
	Total bytes of diff: -16667 (-0.02% of base)
		diff is an improvement.
	Top file improvements by size (bytes):
			-311 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualOrderedScalar_ro\CompareEqualOrderedScalar_ro.dasm (-0.15% of base)
			-311 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualScalar_ro\CompareEqualScalar_ro.dasm (-0.15% of base)
			-311 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualUnorderedScalar_ro\CompareEqualUnorderedScalar_ro.dasm (-0.15% of base)
			-311 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrderedScalar_ro\CompareGreaterThanOrderedScalar_ro.dasm (-0.15% of base)
			-311 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrEqualOrderedScalar_ro\CompareGreaterThanOrEqualOrderedScalar_ro.dasm (-0.15% of base)
	57 total files with size differences (57 improved, 0 regressed), 2942 unchanged.
	Top method regressions by size (bytes):
			  27 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualOrderedScalar_ro\CompareEqualOrderedScalar_ro.dasm - Program:PrintError(struct,ref,ref,ref) (56 methods)
			  27 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualScalar_ro\CompareEqualScalar_ro.dasm - Program:PrintError(struct,ref,ref,ref) (56 methods)
			  27 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualUnorderedScalar_ro\CompareEqualUnorderedScalar_ro.dasm - Program:PrintError(struct,ref,ref,ref) (56 methods)
			  27 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrderedScalar_ro\CompareGreaterThanOrderedScalar_ro.dasm - Program:PrintError(struct,ref,ref,ref) (56 methods)
			  27 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrEqualOrderedScalar_ro\CompareGreaterThanOrEqualOrderedScalar_ro.dasm - Program:PrintError(struct,ref,ref,ref) (56 methods)
	Top method improvements by size (bytes):
			-136 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualOrderedScalar_ro\CompareEqualOrderedScalar_ro.dasm - TestTableSse2`2:CheckUnpack(ref):bool:this (12 methods)
			-136 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualScalar_ro\CompareEqualScalar_ro.dasm - TestTableSse2`2:CheckUnpack(ref):bool:this (12 methods)
			-136 : JIT\HardwareIntrinsics\X86\Sse2\CompareEqualUnorderedScalar_ro\CompareEqualUnorderedScalar_ro.dasm - TestTableSse2`2:CheckUnpack(ref):bool:this (12 methods)
			-136 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrderedScalar_ro\CompareGreaterThanOrderedScalar_ro.dasm - TestTableSse2`2:CheckUnpack(ref):bool:this (12 methods)
			-136 : JIT\HardwareIntrinsics\X86\Sse2\CompareGreaterThanOrEqualOrderedScalar_ro\CompareGreaterThanOrEqualOrderedScalar_ro.dasm - TestTableSse2`2:CheckUnpack(ref):bool:this (12 methods)
	388 total methods with size differences (282 improved, 106 regressed), 192448 unchanged.

Tests x64 NoVEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 2731 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 151675 unchanged.
Tests x64 NoVEX PMI
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 2998 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 192708 unchanged.
	3 files had text diffs but not size diffs.
	JIT\HardwareIntrinsics\X86\Fma_Vector256\Fma_ro\Fma_ro.dasm had 204 diffs
	JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm had 34 diffs
	JIT\HardwareIntrinsics\X86\Avx2\Avx2_ro\Avx2_ro.dasm had 34 diffs

@tannergooding
Copy link
Member

I got the following for Benchmarks:

Benchmarks x64 VEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 82 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 1778 unchanged.
Benchmarks x64 VEX PMI
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 82 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 1878 unchanged.

Benchmarks x64 NoVEX
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 82 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 1778 unchanged.
Benchmarks x64 NoVEX PMI
	Total bytes of diff: 0 (0.00% of base)
	0 total files with size differences (0 improved, 0 regressed), 82 unchanged.
	0 total methods with size differences (0 improved, 0 regressed), 1878 unchanged.

@tannergooding
Copy link
Member

The files that had diffs:
framework-x64-vex-pmi.zip
tests-x64-novex-pmi.zip
tests-x64-vex-pmi.7z.zip -- Actually a 7z file (otherwise it was 36MB, rather than 6MB)

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

@AndyAyersMS @CarolEidt Does the data look good to you?

@CarolEidt
Copy link

The results look good, and as expected. x86 diffs might be nice, but I don't think they're necessary.
@tannergooding you haven't yet approved - is there something else you'd like to see?
@AndyAyersMS - do you have any remaining concerns?
(I've already approved)

@AndyAyersMS
Copy link
Member

No, no concerns.

@tannergooding
Copy link
Member

No, just wanted to make sure we had any regressions covered.

@tannergooding
Copy link
Member

x86 diffs might be nice, but I don't think they're necessary.

I'm working on getting x86 diffs as well, and so far they look much the same as the x64 diffs.

@tannergooding
Copy link
Member

@CarolEidt, @AndyAyers, @fiigii. Should we get diffs again with TieredJitting disabled? (I just spent way too long debugging another issue, only to find out it wasnt working because TieredJitting disabled that optimization).

@AndyAyersMS
Copy link
Member

PMI (when run via jit-dasm-pmi, which in turn is run via jit-diff) disables tiered jitting already.

If you run PMI directly via corerun then you might need to set some env vars first.

@tannergooding
Copy link
Member

Good to know. (I was using COMPlus_JitDisasm for my other case, so I had to explicitly set the env variable).

@tannergooding
Copy link
Member

@CarolEidt, @AndyAyersMS. I'm merging this, since we've all signed off already.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants