Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the JIT to support rewriting more complex intrinsics as user calls #102702

Merged
merged 6 commits into from
May 29, 2024

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 26, 2024

This unblocks #11062 which should in turn unblock #102275

It essentially just fixes rationalization to support non-primitive arguments and return types by doing the necessary handling as part of the rewrite. This allows us to carry intrinsics that require a constant input all the way through the JIT, allowing VN, CSE, Constant Propagation, and other optimizations to still kick in. This in turn means that we can identify many more types of constants and thus allows us to avoid pessimizing as many cases to being calls.

This does change Shuffle to be carried through as an intrinsic node, but doesn't yet add the support to transform it to a proper shuffle intrinsic as I plan on doing that in a separate PR if this one is accepted. Once APIs like Shuffle which expect a constant are handled as an intrinsic we can resolve #11062 and that will unblock our ability to move a large chunk of the xplat implementation (particularly for System.Numerics vector types currently handled via simdashwintrinsic) out of the JIT and into managed code instead.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 26, 2024
@tannergooding tannergooding marked this pull request as ready for review May 27, 2024 19:13
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, @EgorBo

@EgorBo
Copy link
Member

EgorBo commented May 27, 2024

CC. @dotnet/jit-contrib, @EgorBo

Can we run some outerloops? namely outerloop, jitstress and I would also run some R2R job

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

GenTree* gtInlineOperands[2];
union
{
GenTree* gtInlineOperands[2];
Copy link
Member

@EgorBo EgorBo May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow what gtInlineOperands is (and the logic inside SetMethodHandle does, can you explain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtInlineOperands is an existing field that allows GenTreeMultiOp to avoid allocating for the common case where there is 2 or less operands and instead allows us to carry the fields within the main allocation instead.

In order to facilitate rewriting the hwintrinsic back to a call, however, we need to be able to track the CORINFO_METHOD_HANDLE and we don't have enough free space to do that. Adding the field directly ends up making the node larger than TREE_NODE_SZ_LARGE as well, so to avoid pessimizing the rest of the JIT I made this into a union. SetMethodHandle will then force an allocation if gtInlineOperands was being used and set a flag that indicates that gtMethodHandle is set instead.

It's worth noting the actual gtInlineOperands field isn't ever read directly either, it's address is just handed down to GenTreeMultiOp as part of construction or ResetOperandArray call. So this ends up working fairly well and ensures that the allocation is "pay for play" and only used when actually necessary (we have an intrinsic with 1 or 2 operands that requires a constant and needs to be carried through to a later phase).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- Notably the reason we don't have enough free space is primarily because of padding bytes caused by inheritance. There's a few places where nodes are wasting 4-7 bytes of space to maintain 8-byte alignment and that repeats several times.

We could avoid tricks like the one being employed here if we had a better mechanism for avoiding such wasted padding for derived node kinds. But that's a much more complex and independent work item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't notice it was a pre-existing field - I though you added it in this PR 🙂

@EgorBo
Copy link
Member

EgorBo commented May 27, 2024

@jakobbotsch PTAL the rationalization changes

@EgorBo
Copy link
Member

EgorBo commented May 27, 2024

One thing to be careful about - what if some Morph applies some transformation and changes node to be a different intrinsic now? (via e.g. ChangeOper) - will it leave the node in a bad state?

@tannergooding
Copy link
Member Author

One thing to be careful about - what if some Morph applies some transformation and changes node to be a different intrinsic now? (via e.g. ChangeOper) - will it leave the node in a bad state?

The same concern would be true for GT_INTRINSIC which has done this same thing since .NET Framework

In this case the only transform we can really do is if the input is constant changing it to the actual node produced by gtNewSimd*Node in which case we’ll want to clear the flag that says getting the method handle is valid (as we have no reason to rewrite it anymore)

tannergooding and others added 2 commits May 28, 2024 06:29
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tannergooding tannergooding merged commit ec5963d into dotnet:main May 29, 2024
112 of 114 checks passed
@tannergooding tannergooding deleted the rationalize-hwintrinsic branch May 29, 2024 05:27
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
…alls (dotnet#102702)

* Update the JIT to support rewriting more complex intrinsics as user calls

* Updating the shuffle hwintrinsic to always be imported as an intrinsic

* Ensure multi-reg returns are initialized for rewritten hwintrinsics

* Apply suggestions from code review

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

* Adding function headers to SetMethodHandle and SetEntryPoint

* Apply formatting patch

---------

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@github-actions github-actions bot locked and limited conversation to collaborators Jun 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HW intrinsics: Poor codegen for operations requiring imm8 when the value is calculated to constant during JIT
3 participants