JIT: Factor loop duplication code #97506

jakobbotsch · 2024-01-25T12:39:46Z

Factor the loop duplication code out of loop cloning and loop unrolling in anticipation of also using it in loop peeling.

No diffs expected.

Factor the loop duplication code out of loop cloning and loop unrolling in anticipation of also using it in loop peeling.

ghost · 2024-01-25T12:40:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Factor the loop duplication code out of loop cloning and loop unrolling in anticipation of also using it in loop peeling.

No diffs expected.

Author:	jakobbotsch
Assignees:	jakobbotsch
Labels:	`area-CodeGen-coreclr`
Milestone:	-

ryujit-bot · 2024-01-25T14:20:02Z

Diff results for #97506

Throughput diffs

Throughput diffs for windows/x86 ran on linux/x86

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.windows.x86.checked.mch	+0.01%

Details here

jakobbotsch · 2024-01-25T14:57:34Z

src/coreclr/jit/flowgraph.cpp

+void FlowGraphNaturalLoop::Duplicate(BasicBlock**     insertAfter,
+                                     BlockToBlockMap* map,
+                                     weight_t         weightScale,
+                                     bool             bottomNeedsRedirection)


FWIW, we will be able to get rid of this bottomNeedsRedirection parameter once we don't have fallthrough anymore. We could've probably gotten rid of it here with some complications, but I think it is going to be much easier once we don't have fallthrough, so I didn't want to bother.

Thanks for pointing this out. This PR will probably get merged before #97488, so I'll plan on removing this over there.

ryujit-bot · 2024-01-25T15:20:57Z

Diff results for #97506

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,498,771 contexts (1,011,240 MinOpts, 1,487,531 FullOpts).

MISSED contexts: 6,580 (0.26%)

Overall (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch	383,838,152	+0

FullOpts (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch	168,416,476	+0

Details here

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm.checked.mch	+0.01%
realworld.run.linux.arm.checked.mch	+0.01%

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm.checked.mch	+0.01%
realworld.run.linux.arm.checked.mch	+0.01%

Throughput diffs for windows/x86 ran on windows/x86

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.windows.x86.checked.mch	+0.01%

Details here

jakobbotsch · 2024-01-25T15:40:23Z

cc @dotnet/jit-contrib PTAL @BruceForstall

No diffs.

ryujit-bot · 2024-01-25T16:21:58Z

Diff results for #97506

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,498,771 contexts (1,011,240 MinOpts, 1,487,531 FullOpts).

MISSED contexts: 6,580 (0.26%)

Overall (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch	383,838,152	+0

FullOpts (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch	168,416,476	+0

Details here

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm.checked.mch	+0.01%
realworld.run.linux.arm.checked.mch	+0.01%

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm.checked.mch	+0.01%
realworld.run.linux.arm.checked.mch	+0.01%

Throughput diffs for windows/x86 ran on windows/x86

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.windows.x86.checked.mch	+0.01%

Details here

ryujit-bot · 2024-01-25T17:22:59Z

Diff results for #97506

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,498,771 contexts (1,011,240 MinOpts, 1,487,531 FullOpts).

MISSED contexts: 6,580 (0.26%)

Overall (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch	383,838,152	+0

FullOpts (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch	168,416,476	+0

Details here

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm.checked.mch	+0.01%
realworld.run.linux.arm.checked.mch	+0.01%

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.linux.arm.checked.mch	+0.01%
realworld.run.linux.arm.checked.mch	+0.01%

Throughput diffs for windows/x86 ran on windows/x86

FullOpts (-0.00% to +0.01%)

Collection	PDIFF
coreclr_tests.run.windows.x86.checked.mch	+0.01%

Details here

jakobbotsch · 2024-01-26T08:51:39Z

Hmm, there is actually a single diff in linux-arm64. Going to check what that is, but I expect it to just be something around block weighting that is subtly different...

jakobbotsch · 2024-01-26T09:03:38Z

Ah, the diff is because the PR removes this quirk in unrolling:

runtime/src/coreclr/jit/optimizer.cpp

Lines 1671 to 1674 in c105e7d

    
           // TODO-Quirk: Skip empty blocks and go directly to their destination. 
        
           BasicBlock* targetBlk = block->Next(); 
        
           if (targetBlk->KindIs(BBJ_ALWAYS) && targetBlk->isEmpty()) 
        
               targetBlk = targetBlk->GetTarget();

Didn't realize it was still there, but given the single diff that doesn't seem necessary to do separately.

JIT: Factor loop duplication code

c7c259e

Factor the loop duplication code out of loop cloning and loop unrolling in anticipation of also using it in loop peeling.

ghost assigned jakobbotsch Jan 25, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 25, 2024

This was referenced Jan 25, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

jakobbotsch commented Jan 25, 2024

View reviewed changes

jakobbotsch marked this pull request as ready for review January 25, 2024 15:40

jakobbotsch requested a review from BruceForstall January 25, 2024 15:40

jakobbotsch mentioned this pull request Jan 25, 2024

JIT: Add a disabled-by-default loop peeling phase #97517

Closed

BruceForstall approved these changes Jan 25, 2024

View reviewed changes

jakobbotsch merged commit 1e8b750 into dotnet:main Jan 26, 2024

jakobbotsch deleted the factor-loop-duplication branch January 26, 2024 09:03

github-actions bot locked and limited conversation to collaborators Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Factor loop duplication code #97506

JIT: Factor loop duplication code #97506

jakobbotsch commented Jan 25, 2024

ghost commented Jan 25, 2024

ryujit-bot commented Jan 25, 2024

Throughput diffs

Throughput diffs for windows/x86 ran on linux/x86

jakobbotsch Jan 25, 2024

amanasifkhalid Jan 25, 2024

ryujit-bot commented Jan 25, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

jakobbotsch commented Jan 25, 2024

ryujit-bot commented Jan 25, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

ryujit-bot commented Jan 25, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

jakobbotsch commented Jan 26, 2024

jakobbotsch commented Jan 26, 2024

JIT: Factor loop duplication code #97506

JIT: Factor loop duplication code #97506

Conversation

jakobbotsch commented Jan 25, 2024

ghost commented Jan 25, 2024

ryujit-bot commented Jan 25, 2024

Throughput diffs

Throughput diffs for windows/x86 ran on linux/x86

jakobbotsch Jan 25, 2024

Choose a reason for hiding this comment

amanasifkhalid Jan 25, 2024

Choose a reason for hiding this comment

ryujit-bot commented Jan 25, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

jakobbotsch commented Jan 25, 2024

ryujit-bot commented Jan 25, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

ryujit-bot commented Jan 25, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

jakobbotsch commented Jan 26, 2024

jakobbotsch commented Jan 26, 2024