Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT produces unoptimized codegen for indirect byte comparisons against a register #33504

Closed
GrabYourPitchforks opened this issue Mar 12, 2020 · 6 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Mar 12, 2020

Repro code:

public static void Test(ref byte theRef, byte b) {
    if (theRef == b)
    {
        Console.WriteLine("Hi!");
    }
}

Actual codegen:

; Test(Byte ByRef, Byte)
    L0000: movzx ecx, byte [rcx]
    L0003: movzx eax, dl
    L0006: cmp ecx, eax
    L0008: jnz L0024
    L000a: mov rcx, 0x282bd891e88
    L0014: mov rcx, [rcx]
    L0017: mov rax, 0x7ffac922d5e8
    L0021: jmp rax
    L0024: ret

Expected behavior:

I would expect those first three instructions to be combined into a single cmp byte ptr [rcx], dl instruction.

category:cq
theme:codegen
skill-level:intermediate
cost:medium

@GrabYourPitchforks GrabYourPitchforks added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization labels Mar 12, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Mar 12, 2020
@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2020

I guess it works as expected for int and long so the issue is that int->byte->int cast is not handled.

@tannergooding
Copy link
Member

tannergooding commented Mar 12, 2020

Is this possibly because while eax/rax can be tracked "independent registers" (because writing eax clears the 32-bits to zero), al, ah, and ax are considered part of the full register and cause a "partial register access" which can introduce stalls?
There are likely some cases where the stall isn't noteworthy, such as when only reading and never writing the register, but it may be complex for the JIT to track. Emitting movzx is a trivial way to prevent that issue and also avoids needing to know which instructions are faster/slower when dealing with smaller than 32-bit inputs.

3.5.2.4 Partial Register Stalls of the Intel® 64 and IA-32 Architectures Optimization Reference Manual and 5.5 Partial-Register Writes of the Software Optimization Guide for AMD Family 15h Processors goes into more detail

@GrabYourPitchforks
Copy link
Member Author

I don't think partial stalls are a concern here. If you call movzx, you're already taking a dependency on the low byte of the source register. Regardless, even if the JIT had to be pessimistic it could still be codegened as:

movzx eax, dl
cmp byte ptr [rcx], al

The bigger concern IMO is that two registers need to be used to hold the temporary values. That could create an unnecessary stack spill.

@tannergooding
Copy link
Member

If you call movzx, you're already taking a dependency on the low byte of the source register.

The point of this is that movzx is a dependency breaking pattern (because it updates the full 32-bit register). I believe the partial stall generally only applies when also doing partial writes, however so it might be okay here.

@AndyAyersMS
Copy link
Member

This is one of those cases where the jit codegen directly reflects IL semantics (integral IL stack entries are always 32 or 64 bits). The values to be compared are widened to stack size and then compared.

What the jit is missing is an optimization that can resize and reshape computations to the optimal version for the target ISA (to be fair, the jit does some of this today, but it's hit and miss).

In this case, for x64, the optimization is narrowing. Tracking cases of these with #32504.

@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2020
@BruceForstall BruceForstall added this to the Future milestone Apr 4, 2020
@BruceForstall
Copy link
Member

Since the JIT optimization work here is tracked by a separate issue, I'm closing this one.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

6 participants