Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Updating the emitter to more generally handle 4-Byte SSE4 instructions. #16249

Merged
merged 4 commits into from
Feb 9, 2018
Merged

Updating the emitter to more generally handle 4-Byte SSE4 instructions. #16249

merged 4 commits into from
Feb 9, 2018

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Feb 7, 2018

FYI. @CarolEidt, @fiigii, @eerhardt

This should mostly resolve https://github.com/dotnet/coreclr/issues/15908 and https://github.com/dotnet/coreclr/issues/16216, at least for the code paths currently being executed.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM except that I can't figure out why you're not emitting an extra byte in emitOutputRR() (probably something I've missed).


if ((id->idInsFmt() != IF_RWR_RRD_ARD) && (id->idInsFmt() != IF_RWR_RRD_ARD_CNS))
// encode source operand reg in 'vvvv' bits in 1's compliement form

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here you could fix the typo: "compliement" should be "complement"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it here, and the 12 other places it occurred in this document.

@@ -10323,7 +10389,7 @@ BYTE* emitter::emitOutputRR(BYTE* dst, instrDesc* id)
assert(IsAVXInstruction(ins) || IsSSE4Instruction(ins));
if ((code & 0xFF00) == 0xC000)
{
dst += emitOutputByte(dst, (0xC0 | regCode));
dst += emitOutputWord(dst, code | (regCode << 8));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably something I'm missing here, but how did this go from emitting a byte to emitting a word?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell based on the comment above, this was trying to support the smaller encoding (which isn't supported anywhere else in the emitter).

I thought I had a comment indicating I was still looking at this in particular, but can't seem to find it anymore (maybe I just forget to submit).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt, I'm finishing validating locally, but it looks like this is a dead code path and is not currently hit (for emitOutputRR). I can only speculate (based on the comment above this if block) that it was meant to support the smaller 2-byte VEX prefix scenario, which isn't actually working.

However, for emitOutputInstr it special cases IF_RRW_RRW_CNS, that does hit the equivalent code path, and that requires it to emitOutputWord.

I believe the correct fix is to refactor all three cases where this particular code pattern is (emitOutputRR, emitOutputRRR, and IF_RRW_RRW_CNS in emitOutputInstr) to do the following:

// TODO-XArch-CQ: Right now support 4-byte opcode instructions only
if ((code & 0xFF00) == 0xC000)
{
    dst += emitOutputWord(dst, code | (regCode << 8));
}
else if ((code & 0xFF) == 0x00)
{
    // This case happens for SSE4/AVX instructions only
    assert(IsAVXInstruction(ins) || IsSSE4Instruction(ins));

    dst += emitOutputByte(dst, (code >> 8) & 0xFF);
    dst += emitOutputByte(dst, (0xC0 | regCode));
}
else
{
    dst += emitOutputWord(dst, code);
    dst += emitOutputByte(dst, (0xC0 | regcode));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to the above. I validated that the original code path was never hit for emitOutputRR and emitOutputRRR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for checking this out and cleaning it up.

@tannergooding tannergooding changed the title [WIP] Updating the emitter to more generally handle 4-Byte SSE4 instructions. Updating the emitter to more generally handle 4-Byte SSE4 instructions. Feb 8, 2018
@tannergooding
Copy link
Member Author

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

test OSX10.12 x64 Checked jitincompletehwintrinsic
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx2
test OSX10.12 x64 Checked jitx86hwintrinsicnosimd
test OSX10.12 x64 Checked jitnox86hwintrinsic

@tannergooding
Copy link
Member Author

tannergooding commented Feb 8, 2018

The following are a separate issue, tracked by https://github.com/dotnet/coreclr/issues/16236:

  • x64_checked_windows_nt_jitstress2
  • x64_checked_windows_nt_jitstress2_jitstressregs8
  • x64_checked_windows_nt_jitstress2_jitstressregs4
  • x64_checked_windows_nt_jitstress2_jitstressregs3
  • x64_checked_windows_nt_jitstress2_jitstressregs2
  • x64_checked_windows_nt_jitstress2_jitstressregs1
  • x64_checked_windows_nt_jitstress2_jitstressregs0x1000
  • x64_checked_windows_nt_jitstress2_jitstressregs0x80
  • x64_checked_windows_nt_jitstress2_jitstressregs0x10
  • x64_checked_windows_nt_jitstress1

The following timed out and have been reset:

  • x64_checked_osx10.12_jitx86hwintrinsicnoavx_flow

The following is due to an existing issue: #16249 (comment)

  • x64_checked_windows_nt_jitstressregs4

@tannergooding
Copy link
Member Author

tannergooding commented Feb 8, 2018

x64_checked_windows_nt_jitstressregs4 is not related. The AddRex*Prefix checks need to be updated to account for the prefetch instructions (which are SSE instructions, but which need the actual REX prefix, rather than the VEX prefix).

I've logged a bug (https://github.com/dotnet/coreclr/issues/16286) and should have a fix up tonight.

@4creators
Copy link

The following has been reset, tests failed due to OOM, looks unrelated:

seems to be related to Test Infrastructure Failure: The paging file is too small for this operation to complete. failures in #16237

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plumb support for Is4ByteSSEInstruction through the emitter
3 participants