Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reworking how the x86 emitter handles the 'w' bit #35305

Closed
tannergooding opened this issue Apr 22, 2020 · 10 comments
Closed

Consider reworking how the x86 emitter handles the 'w' bit #35305

tannergooding opened this issue Apr 22, 2020 · 10 comments
Labels
arch-x64 arch-x86 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Apr 22, 2020

As investigated on #34550 (comment), the emitter currently has some odd support for detecting whether a given instruction uses the 'w' and 's' encoding bits.

It would likely benefit from having this be an INS_FLAGS entry that can be trivially checked to statically determine the appropriate encoding that is required.

category:implementation
theme:emitter
skill-level:intermediate
cost:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 22, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@tannergooding tannergooding added arch-x64 arch-x86 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 22, 2020
@BruceForstall BruceForstall added this to the 5.0 milestone Apr 24, 2020
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Apr 29, 2020
@BruceForstall
Copy link
Member

I'm going to move this to Future.

@BruceForstall BruceForstall modified the milestones: 5.0, Future Jun 3, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@anthonycanino
Copy link
Contributor

Hi @tannergooding , I'm curious about the extent of the change for this issue and wanted to get some clarification.

Are we looking to add new bits to the INS_FLAGS entry, say HasWBit and HasSBit and use these to help remove some of the instruction-specific checks throughout emitxarch.cpp, such as https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emitxarch.cpp#L10081, where we would do something like

if ((size != EA_1BYTE) && (ins != INS_imul) && HasWBit(ins) && !IsSSEInstruction(ins) &&
            !IsAVXInstruction(ins))
{
  code++;
}

or are we looking for a more extensive change that would ideally group up the logic for checking when a 'w' bit is needed in general, i.e., write some AddWBitIfNeeded function to factor out some of the sections of the code where the size of the instruction is being check, such as https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emitxarch.cpp#L11940-L11977.

@tannergooding
Copy link
Member Author

To start with, it would simply be correctly adding a flag to the instructions that require the w or s-bit and minimally using that where applicable.

More extensive changes would ideally come later to help simplify the initial PR and isolate the changes as much as possible.

  • There's probably a bit of wonkiness in a few places since REX.W is part of the REX byte and we also have AddRex*PrefixIfNeeded
  • We already have a few helper functions like TakesRexWPrefix (I believe we also have a similar for Rex.B, not sure about Rex.X`)

@anthonycanino
Copy link
Contributor

Thanks @tannergooding . I've been inspecting the code and there is one edge case that I need a little clarification on. It's actually the above snippet unmodified.

In emitOutputAM, the following code chunk which was related to the issue is #34550

    else if (code & 0x00FF0000)
    {
        // BT supports 16 bit operands and this code doesn't handle the necessary 66 prefix.
        assert(ins != INS_bt);

        // Output the REX prefix
        dst += emitOutputRexOrVexPrefixIfNeeded(ins, dst, code);

        // Output the highest byte of the opcode
        if (code & 0x00FF0000)
        {
            dst += emitOutputByte(dst, code >> 16);
            code &= 0x0000FFFF;
        }

        // Use the large version if this is not a byte. This trick will not
        // work in case of SSE2 and AVX instructions.
        if ((size != EA_1BYTE) && (ins != INS_imul) && (ins != INS_bsf) && (ins != INS_bsr) && !IsSSEInstruction(ins) &&
            !IsAVXInstruction(ins))
        {
            code++;  // <--- HERE
        }
    }

Here, I understand that checks for bsf and bsr as they do not have the 'w' bit. But from reading B.2 GENERAL-PURPOSE INSTRUCTION FORMATS AND ENCODINGS FOR NON-64-BIT MODES and B.2.1 General Purpose Instruction Formats and Encodings for 64-Bit Mode, imul does it, just not in imul reg1, mem form.

I think this complicates encoding the 'w' bit in the INS_Flags, as it would have to be done per mode. But, I think the overflow is the problem here, because imul does not require the 'w' to be set to use full operand size in imul reg1, mem form (to my understanding, that bit is already set) if we switch to

        // Use the large version if this is not a byte. This trick will not
        // work in case of SSE2 and AVX instructions.
        if ((size != EA_1BYTE) && HasWBit(ins) && !IsSSEInstruction(ins) && !IsAVXInstruction(ins))
        {
            code |= 0x1;  // <--- HERE
        }

we would be ok, unless I have misunderstood why code++ is used instead of code |= 0x1 like in other places.

Is this the correct line of thinking here to simplify?

@tannergooding
Copy link
Member Author

INS_imul is a bit odd in that its many instructions rather than a single instruction in our tables (there are multiple INS_imul_* variants).

You might have to look at both B.2 and the general instruction page to get the full context over things:
image

This shows that picking between 8-bit and full-width (32-bit if no REX prefix or REX.W=0; 64-bit otherwise) only exists for the imul r/m variant (F6 vs F7) the other instructions only support the full-width encoding and so w is always 1.

@anthonycanino
Copy link
Contributor

Thanks for this. I see that we have an imulEAX instruction which would have the 'w' bit, but imul would not. This is very helpful, so I can encode the 'w' bit properly and reduce the entire check to:

if ((size != EA_1BYTE) && HasWBit(ins))
{
  code |= 0x1; 
}

@anthonycanino
Copy link
Contributor

@tannergooding I was re-reading your first comment,

  • There's probably a bit of wonkiness in a few places since REX.W is part of the REX byte and we also have AddRex*PrefixIfNeeded
  • We already have a few helper functions like TakesRexWPrefix (I believe we also have a similar for Rex.B, not sure about Rex.X`)

and I wanted to clarify one point. These functions determine whether to emit the REX.W bit, and as part of a larger refactoring, we'll have to consider how that interplays with parts of the code that determine whether to emit the 'w' bit, but we are only talking about tracking the 'w' and 's' bits in the INS_FLAGS entry, not an additional 'REX.W' bit, correct?

@tannergooding
Copy link
Member Author

Right, that was me conflating two different w bits in the original message. REX.W is different from the w bit actually being discussed here.

@anthonycanino
Copy link
Contributor

anthonycanino commented Aug 22, 2022

@tannergooding I think we should close this out per the merged PR?

@ghost ghost locked as resolved and limited conversation to collaborators Sep 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 arch-x86 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage
Projects
None yet
Development

No branches or pull requests

5 participants