[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137

TIHan · 2022-10-17T23:32:50Z

Description

Resolves this issue: #75119 - we are only planning on doing 3-instruction replacement sequences or less, no more than that.

This PR does these two optimizations:

-       mov      edx, eax
-       shl      edx, 2
+       lea      edx, [4*rax]

-       mov      edx, eax
-       shl      edx, 3
+       lea      edx, [8*rax]

This PR also lifts the restriction that only allowed GT_LCL_VAR to be the first operand for "multiply by constant"; now it will create a tmp local if needs to - this ensures that we can take advantage of these optimizations.

I wanted to keep the "multiply by constant" optimizations in one place, and that place is lowering. Doing this required to disable any "multiply by constant" optimizations from happening in Tier-0, which I think is reasonable. This means Tier-0 will always emit imul, except for the cases that can emit lea.

We only do the "multiply by constant" -> lea instruction in codegen.

Notes:
There are other replacement sequences that were considered, but ultimately, those replacement sequences have higher latency totals than the single imul for modern CPUs. These cases are described and tested in the IntMultiply disasm tests.

Microbenchmark Results

    using BenchmarkDotNet.Attributes;
    using BenchmarkDotNet.Running;

    namespace Perf
    {
        public class Multiply
        {
            static ulong fieldValue = 1;

            [Benchmark]
            public void shift_2()
            {
                ulong value = fieldValue;
                for (ulong i = 1; i < 10000000; i++)
                {
                    value = i << 2;
                }
                fieldValue = value;
            }

            [Benchmark]
            public void shift_3()
            {
                ulong value = fieldValue;
                for (ulong i = 1; i < 10000000; i++)
                {
                    value = i << 3;
                }
                fieldValue = value;
            }
        }
        static class Program
        {
            static int Main(string[] args)
            {
                BenchmarkRunner.Run<Multiply>(null, args);
                return 0;
            }
        }
    }

Before:

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.674)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2


|  Method |     Mean |     Error |    StdDev |
|-------- |---------:|----------:|----------:|
| shift_2 | 1.842 ms | 0.0032 ms | 0.0030 ms |
| shift_3 | 1.842 ms | 0.0025 ms | 0.0023 ms |

After:

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.674)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2
  Job-GILPZC : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Toolchain=CoreRun

|  Method |     Mean |     Error |    StdDev |
|-------- |---------:|----------:|----------:|
| shift_2 | 1.828 ms | 0.0048 ms | 0.0045 ms |
| shift_3 | 1.830 ms | 0.0052 ms | 0.0049 ms |

Acceptance Criteria

Merge [JIT] X64 - Three instruction replacement sequence for multiply in certain cases #76981

… Made SuperFileCheck anchors more likely to match.

ghost · 2022-10-17T23:33:04Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description
TBD

Refactors some of the codegen multiply optimizations by moving them to lowering.

Acceptance Criteria

Merge [JIT] X64 - Three instruction replacement sequence for multiply in certain cases #76981

Author:	TIHan
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

src/tests/JIT/opt/Multiply/IntMultiply.cs

src/tests/JIT/opt/Multiply/IntMultiply.csproj

EgorBo · 2022-10-21T18:19:17Z

Can you please run fuzz pipelines yourself once you're done with changes?

TIHan · 2022-10-21T18:40:22Z

/azp run Fuzzlyn

azure-pipelines · 2022-10-21T18:40:41Z

Azure Pipelines successfully started running 1 pipeline(s).

TIHan · 2022-10-21T21:38:29Z

Looks like Fuzzlyn x64 passed. I need to update one of the disasm checks as I forgot a minor thing.

tannergooding · 2022-10-23T03:46:21Z

What about:

-       mov      edx, eax
-       shl      edx, 1
+       lea      edx, [2*rax]

TIHan · 2022-10-24T18:39:23Z

@tannergooding We already handle that case for x << 1 in codegen, it emits this:

lea rax, [rcx+rcx]

TIHan · 2022-10-24T18:47:08Z

@dotnet/jit-contrib This is ready again - I fixed the tests and one of the earlier commits did pass fuzzlyn.

src/coreclr/jit/codegenxarch.cpp

BruceForstall · 2022-10-26T04:14:22Z

Have you created a BDN microbenchmark to show the performance before/after your optimizations?

TIHan · 2022-10-26T16:50:52Z

Have you created a BDN microbenchmark to show the performance before/after your optimizations?

Have not, but will do that now.

TIHan · 2022-10-26T18:33:09Z

@BruceForstall I've provided microbenchmark results in the description of this PR.

BruceForstall · 2022-10-26T19:03:29Z

Your SuperPMI jobs are failing with:

Traceback (most recent call last):

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1720, in create_one_artifact

    with open(item_path, 'r') as file_handle:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\h\\w\\B0FF09A8\\p\\artifacts\\spmi\\asm.coreclr_tests.run.windows.x64.checked\\base\\18886.dasm'



The above exception was the direct cause of the following exception:



Traceback (most recent call last):

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 4451, in <module>

    sys.exit(main(args))

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 4342, in main

    success = asm_diffs.replay_with_asm_diffs()

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1736, in replay_with_asm_diffs

    subproc_helper.run_to_completion(create_replay_artifacts, self, mch_file, asm_dotnet_vars_full_env, text_differences, base_asm_location, diff_asm_location, ".dasm")

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 601, in run_to_completion

    loop.run_until_complete(self.__run_to_completion__(async_callback, *extra_args))

  File "C:\python3\lib\asyncio\base_events.py", line 647, in run_until_complete

    return future.result()

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 584, in __run_to_completion__

    await asyncio.gather(*tasks)

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 556, in __get_item__

    await async_callback(print_prefix, item, *extra_args)

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1726, in create_replay_artifacts

    base_txt = await create_one_artifact(self.base_jit_path, base_location, flags + base_option_flags_for_diff_artifact)

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1723, in create_one_artifact

    raise create_exception() from err

Exception: Failure while creating JitStdOutFile.

Exit code: 0

This has been fixed. Maybe you just need to rebase/re-push to trigger updated CI testing?

BruceForstall · 2022-11-01T01:26:19Z

@TIHan Note that the formatting job failed. If you don't already, try to get into the habit of running "jit-format -f" before pushing a PR change to GitHub.

Also, it looks like the unix-x64 superpmi-diffs job failed for unknown reasons.

TIHan · 2022-11-04T18:01:07Z

@dotnet/jit-contrib @BruceForstall looks like CI is passing - is there anything else that I should do in the PR?

BruceForstall · 2022-11-05T01:27:11Z

Diffs

TIHan added 24 commits October 12, 2022 19:20

Using 3 instruction sequence for x64 multiply

78c3247

Do not do this in morph. Do it in codegen now.

06a60e1

Fixing codegen

7fb0095

Only allow values under 127 and do not skip mov - correctness testing

86edb6c

Try to fix tests

0955d11

cleanup

865cf95

Moving to Lowering

999eac0

Quick fix

6ebf58e

Fully works in lowering now

6e8f28c

Account for all ints

94968a4

Take into account codegen opts

64b523a

Minor cleanup

1cde9c0

Minor cleanup

c6fba6e

Fixed test

ff2b9a1

Added int multiply disasm checks. Fixed SuperFileCheck namespace bug.…

338dbe5

… Made SuperFileCheck anchors more likely to match.

Update comments

106c6b7

Update comments

8fc5b37

Update comments

e5835db

Update comments

3be48bf

Formatting

b3d4a5f

Fixing build

843617a

Fixing build again

74b1071

minor rename

90b7e7d

Moving x64 multiply codegen optimizations to lowering

ead83a5

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 17, 2022

ghost assigned TIHan Oct 17, 2022

hughbe reviewed Oct 18, 2022

View reviewed changes

src/tests/JIT/opt/Multiply/IntMultiply.cs Outdated Show resolved Hide resolved

markples reviewed Oct 18, 2022

View reviewed changes

src/tests/JIT/opt/Multiply/IntMultiply.csproj Outdated Show resolved Hide resolved

Using ReplaceWithLclVar

b5193c0

TIHan added 4 commits October 21, 2022 14:39

Update comments. Fix disasm test.

85ca0f4

Another minor test change

e633fef

Fix test

5a459d8

Fix test

f40aa1c

tannergooding reviewed Oct 25, 2022

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

Feedback

19d26d1

build-analysis bot mentioned this pull request Oct 25, 2022

Tracking Nuget 429s dotnet/arcade#10885

Closed

2 tasks

runfoapp bot mentioned this pull request Oct 26, 2022

Compiler crashes when failing to release Mutex #53420

Open

Update IntMultiply.cs

923d10b

TIHan mentioned this pull request Oct 31, 2022

Optimize arithmetic/bitwise operations of short, ushort, sbyte, and byte. #44849

Closed

Update codegenxarch.cpp

5ac902d

Update codegenxarch.cpp

c8fb73b

BruceForstall approved these changes Nov 5, 2022

View reviewed changes

TIHan merged commit 2406653 into dotnet:main Nov 5, 2022

EgorBo mentioned this pull request Nov 8, 2022

Regressions in System.Buffers.Text.Tests.Utf8FormatterTests #78041

Closed

ghost locked as resolved and limited conversation to collaborators Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137

[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137

TIHan commented Oct 17, 2022 •

edited

Loading

ghost commented Oct 17, 2022

EgorBo commented Oct 21, 2022 •

edited

Loading

TIHan commented Oct 21, 2022

azure-pipelines bot commented Oct 21, 2022

TIHan commented Oct 21, 2022

tannergooding commented Oct 23, 2022 •

edited

Loading

TIHan commented Oct 24, 2022

TIHan commented Oct 24, 2022

BruceForstall commented Oct 26, 2022

TIHan commented Oct 26, 2022

TIHan commented Oct 26, 2022

BruceForstall commented Oct 26, 2022

BruceForstall commented Nov 1, 2022

TIHan commented Nov 4, 2022

BruceForstall commented Nov 5, 2022

[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137

[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137

Conversation

TIHan commented Oct 17, 2022 • edited Loading

ghost commented Oct 17, 2022

EgorBo commented Oct 21, 2022 • edited Loading

TIHan commented Oct 21, 2022

azure-pipelines bot commented Oct 21, 2022

TIHan commented Oct 21, 2022

tannergooding commented Oct 23, 2022 • edited Loading

TIHan commented Oct 24, 2022

TIHan commented Oct 24, 2022

BruceForstall commented Oct 26, 2022

TIHan commented Oct 26, 2022

TIHan commented Oct 26, 2022

BruceForstall commented Oct 26, 2022

BruceForstall commented Nov 1, 2022

TIHan commented Nov 4, 2022

BruceForstall commented Nov 5, 2022

TIHan commented Oct 17, 2022 •

edited

Loading

EgorBo commented Oct 21, 2022 •

edited

Loading

tannergooding commented Oct 23, 2022 •

edited

Loading