Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeGen: Clean up instruction emitting methods #12178

Open
CarolEidt opened this issue Mar 4, 2019 · 5 comments
Open

CodeGen: Clean up instruction emitting methods #12178

CarolEidt opened this issue Mar 4, 2019 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@CarolEidt
Copy link
Contributor

The conventions for organization and naming of the methods that emit instructions are unclear and inconsistent. I believe that the existing/intended conventions are:

  • the methods on CodeGen that start with gen are concerned with generating code for a class of GenTree operators, or shared functionality across operators that isn't specific to instruction encodings.
  • the methods on CodeGen that start with inst are concerned with emitting (usually) single instructions of a given form, generally specified by the suffix of the inst method:
    • RV means "register value"
    • IV means "immediate value"
    • TT means a GenTree node that's a memory or register operand (though it may actually only be used for memory operands?)
    • e.g. inst_TT_RV has two operands: One of which is a GenTree and the other of which is a register.
  • The methods on emitter that start with emitIns are responsible for emitting a single instruction with a given format, e.g. emitIns_R_AR_I emits an instruction with a register operand, an address operand and an immediate operand (though the address is expressed by passing the base register and offset, and doesn't support an index)
    • In general, methods on emitter shouldn't take GenTree nodes, though many do: the handling of general addressing modes is done in emitHandleMemOp, and its relatives that take a GenTreeIndir, and the handling of binary instructions is done in emitInsBinary that takes GenTree nodes.

category:implementation
theme:jit-coding-style
skill-level:intermediate
cost:large

@CarolEidt
Copy link
Contributor Author

cc @dotnet/jit-contrib, @tannergooding @mikedn
This came up in the discussion of dotnet/coreclr#22944.

@tannergooding
Copy link
Member

The methods on emitter that start with emitIns are responsible for emitting a single instruction

Just a note. I think the current exception here is emitIns_SIMD methods, which will emit one or two instructions in order to handle the differences between the legacy and VEX encodings. The latter generally allowing ins dst, src, src but the former only allowing ins dst/src, src and therefore requiring a mov dst, src when dst != src.

The HWIntrinsic feature tries to exclusively use the emitIns_SIMD methods where possible; but general code does not. This often leads to weird disassembly and sometimes other places trying to handle these differences.

@mikedn
Copy link
Contributor

mikedn commented Mar 5, 2019

the methods on CodeGen that start with gen are concerned with generating code for a class of GenTree operators, or shared functionality across operators that isn't specific to instruction encodings.

Yeah, and for some the prefix is actually genCodeFor, as in genCodeForShift. Not sure if that was intended to have a meaning or if someone though that "CodeFor" adds some kind of extra value.

the methods on CodeGen that start with inst are concerned with emitting (usually) single instructions of a given form, generally specified by the suffix of the inst method:

Those inst_ functions are rather messy, probably many are leftovers from the legacy codegen:

  • many aren't used - inst_TT_SH, inst_TT_CL, inst_RV_RR (RR?!!) etc.
  • many are trivial wrappers over emitter functions (e.g. inst_ST_RV calls emitIns_S_R and it's only used in one place, that's just pointless)
  • some aren't only trivial wrappers but they're also error prone - inst_RV_RV's type parameter defaults to TYP_I_IMPL, makes it easy to accidentally emit a 64 bit instruction when you actually wanted a 32 bit one (personally I avoid inst_RV and inst_RV_RV whenever I have the chance).

There's also the ins_ prefix - ins_Load, ins_Store etc. These usually return an instruction for a give operation (load, store etc.) and type. They come with their own set of problems:

  • abuse - quite often used with constant type parameters - ins_Load(TYP_I_IMPL) when simply writing INS_mov would have been more clear
  • error prone - they return an instruction and it's up to the caller to select the proper emitAttr. For example, ins_Move_Extend(TYP_UBYTE) will give you a INS_movzx but it's up to you to use EA_1BYTE. That's a more general problem of the emitter interface where instruction and emitAttr are specified separately. They should probably be combined into a single value that functions like ins_Move_Extend can return. And emitter interface functions would have one less parameter.
  • horrible code - for some reason these ins_ functions are in instr.cpp and their body is basically a giant ifdef.

In general, methods on emitter shouldn't take GenTree nodes, though many do: the handling of general addressing modes is done in emitHandleMemOp, and its relatives that take a GenTreeIndir

Or they should take only the GenTree that's truly needed, in the case of memory operands that's usually the address tree and not the entire indir.

and the handling of binary instructions is done in emitInsBinary that takes GenTree nodes.

emitInsBinary belongs in CodeGen (and it looks like its name should actually be inst_TT_TT).

Also, the emitter interface is a complete mess:

  • unused functions - emitIns_IJ, emitIns_J_S etc.
  • oddly named functions - emitIns_I_AR. No x86 instructions have an immediate as the first operand. And indeed, the format emitIns_I_AR uses is IF_ARD_CNS. What gives?
  • oddly named function parameters - void emitIns_R_ARX(instruction ins, emitAttr attr, regNumber ireg, regNumber reg, regNumber rg2, unsigned mul, int disp);. Good luck figuring out that reg is base, rg2 is index (and that ireg probably means integer register and not index register). Well, it's not that difficult because the function definition actually uses different (!) parameter names that are better. But not always. I've heard that naming is hard but this is just ridiculous. Especially considering that x86 has pretty standard terminology - reg, base, index, scale, disp, imm.
  • duplicate code - emitIns_R_AR, emitIns_R_ARR and emitIns_R_ARX do the same thing, the only difference is what components of the address mode are specified. Well, one may think that they do the same thing, they actually do not. emitIns_R_AR has a special check for 4-byte SSE instructions. The other 2 variants do not.
  • "binary" emitIns functions that are also "unary" - for emitIns_AR_R & co. the register is optional. I have no idea why, there's also emitIns_AR. But the important part is that emitIns_AR_R can't assert that the register is not REG_NA so you have to do it yourself in codegen code. I'd expect the emitter functions do make such asserts.
  • problematic naming - it's not obvious from emitIns_R_R's name which format is used. It uses IF_RRD_RRD. So it will work correctly for instructions like cmp eax, ebx but I'm not sure what happens if you use it for mov eax, ebx or add eax, ebx.

@CarolEidt
Copy link
Contributor Author

Another inconsistency is that there are a number of methods that predict the size of an instruction (e.g. emitInsSizeAM) but these mostly don't take into account the 4-byte SSE instructions, so there's compensating code spread around to take care of them.
Furthermore, the instruction size estimation code is pretty hard to follow, and over-estimates in some cases. Ideally, the code in emitter::emitIssue1Instr() would assert if the estimate is too large (currently it just prints a line in the jitdump:

printf("Instruction predicted size = %u, actual = %u\n", emitInstCodeSz(id), csz);

@sandreenko
Copy link
Contributor

Some smaller XARCH issues that I found working on dotnet/coreclr#25050:

  1. a few methods fake instruction descriptors without proper settings; it leads to dirty code and assert (for example when you check _idDebugOnlyInfo for the faked id when is set to the original id with a different id size), example: https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L12385;

  2. There are 3 instructions that uses idSize when emit instruction bytes:
    -nop instructions use idSize field to indicate how many nops to insert;
    -emitOutputLJ needs idSize for INS_mov and INS_lea;

  • for the others idSize can be calculated from other fields and doesn't participate in emitOutput*, we need to fixit for nop, mov and lea;
  1. Many TODO-XArch-Cleanup that are old and look very suspicious;

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed design-discussion Ongoing discussion about design without consensus JitUntriaged CLR JIT issues needing additional triage labels Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

6 participants