Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we mark op2 as delayFree if it is rmw and the parent node returns a non-SIMD type #36226

Merged
1 commit merged into from
May 11, 2020

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 11, 2020

This resolves #36198

We were missing the scenario where the intrinsic is RMW, op2 was contained, and where the parent node produces a non-SIMD result. In that scenario, the contained node may use a register from the same set as the target operand, in which case we need op2 to be delay free.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 11, 2020
@tannergooding
Copy link
Member Author

CC. @CarolEidt, @echesakovMSFT, @kunalspathak

@tannergooding
Copy link
Member Author

This wasn't failing for 32-bit because we have special handling here to account for only RBM_BYTE_REGS being allowed for use as byte registers (according to the comment at least): https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/lsraxarch.cpp#L2487-L2506

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@echesakov
Copy link
Contributor

@tannergooding I didn't understand the issue description:

We were missing the scenario where the intrinsic is RMW, op2 was contained, and where the parent node produces a non-SIMD result. In that scenario, the contained node may use a register from the same set as the target operand, in which case we need op2 to be delay free.

Can you please explain how contained op2 may use a register? My understanding if op2 is contained it doesn't need a register since it's computed as a part of the parent node. An example of such situation would also be helpful

@CarolEidt
Copy link
Contributor

Can you please explain how contained op2 may use a register?

A contained op2 can be an addressing mode that uses a base and/or index register.

@echesakov
Copy link
Contributor

@CarolEidt Thanks,make sense

@tannergooding
Copy link
Member Author

Right, the x86 addressing mode allows for [baseReg + indexReg * scale].

We had Sse42.Crc32(0xffffffffU, *ptr), and were generating

       488B7D60             mov      rdi, qword ptr [rbp+60H]
       BEFFFFFFFF           mov      esi, -1
       8BFE                 mov      edi, esi
       F2400F38F03F         crc32    edi, byte  ptr [rdi]

@ghost
Copy link

ghost commented May 11, 2020

Hello @tannergooding!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 5922e80 into dotnet:master May 11, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
@tannergooding tannergooding deleted the fix-36198 branch November 11, 2022 15:30
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
4 participants