Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'w' and 's' bit to xarch instruction flags. #61198

Merged
merged 6 commits into from
Dec 18, 2021

Conversation

anthonycanino
Copy link
Contributor

This PR addresses #35305.

Changes

Change encodes 'w' and 's' in the insFlags struct and INS_FLAG entry for xarch instruction table. In addition, HasWBit and HasSBit check if this flag is set for an instruction, which allows to start simplifying some of the various ad-hoc checks for these bits that were previously done per-instruction throughout emitxarch.cpp.

Discussion

Please see #35305 for an extended discussion on adding the 'w' and 's' bit as INS_FLAGS entries.

There is one point that I need some expert knowledge on: whether or not we need isInsCMOV and !isInsCMOV at two different points. I left comments in the code to draw attention to it: 33a9ddb#diff-6b4e0f32449f2f144e05699f59f74415a564693637f643084a896dfbd081830dR10984 and 33a9ddb#diff-6b4e0f32449f2f144e05699f59f74415a564693637f643084a896dfbd081830dR11449

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 4, 2021
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 4, 2021
@ghost
Copy link

ghost commented Nov 4, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR addresses #35305.

Changes

Change encodes 'w' and 's' in the insFlags struct and INS_FLAG entry for xarch instruction table. In addition, HasWBit and HasSBit check if this flag is set for an instruction, which allows to start simplifying some of the various ad-hoc checks for these bits that were previously done per-instruction throughout emitxarch.cpp.

Discussion

Please see #35305 for an extended discussion on adding the 'w' and 's' bit as INS_FLAGS entries.

There is one point that I need some expert knowledge on: whether or not we need isInsCMOV and !isInsCMOV at two different points. I left comments in the code to draw attention to it: 33a9ddb#diff-6b4e0f32449f2f144e05699f59f74415a564693637f643084a896dfbd081830dR10984 and 33a9ddb#diff-6b4e0f32449f2f144e05699f59f74415a564693637f643084a896dfbd081830dR11449

Author: anthonycanino
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@JulieLeeMSFT
Copy link
Member

@tannergooding PTAL the community PR.

@anthonycanino
Copy link
Contributor Author

I believe the Linux x64 test Interop/PInvoke/Generics/GenericsTest/GenericsTest.sh currently failing on main (#61300)

@tannergooding
Copy link
Member

The changes generally LGTM. I still need to finish going through the instruction tables to validate the instrsxarch.h changes all look good.

//
// Return Value:
// true if instruction has the 's' bit, false otherwise
bool emitter::HasSBit(instruction ins)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For s in particular, I wonder if there needs to be a specialization of it...

For example, if we look at
image

We'll see that all variants have the w bit; but only 2 variants has the s bit (and have the same pattern). There is an alternative/shorter encoding that does immediate to AL, AX, or EAX which doesn't use the s bit.

Likewise if we look at the 64-bit mode variants:
image

There are similar cases for the w/s-bit and so "just the flag" doesn't look sufficient to know "does the bit exist and is it valid to set"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotations otherwise all look correct for where some encoding for an instruction uses the s and or w bits, respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you think that we need a combination of looking at what form the instruction is in and whether it has the w/s bit to properly implement HasSBit and HasWBit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's somewhat the concern, yes.

Basically, simply annotating Has_Sbit and Has_Wbit isn't enough and might cause later downstream bugs. It really is a few factors that determine if the S/W bit are present and if it means anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand.

I have to go back over the tables to understand how many cases there are that might actually arise, or if there is a general enough pattern that most of the logic can be coded in using the Has_Sbit and Has_Wbit + checks in HasSbit and HasWbit can be enough, or whether we to specialize the flag encoding further?

Do you have any thoughts on the path forward? I notice that a lot of the cases that do not have s and w bit in 64-bit mode are when Rex.W is set (which I originally thought indicates that 'w' bit is not needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "w" in the legend for "The value of bit W. in REX has no effect" is actually referring to this, and not the 'w' bit that we have been discussing...

image

because to my understanding, in 64-bit mode, the operands are always 64-bit for the call instruction and the REX.W can be 0 or 1, but it won't change the operand size for this particular instruction.

The "S" in the legend for "If the value of REX.W. is 1, it overrides the presence of 66H" is used like so

image

In other words, those 'S' and 'w' legends are not referring to the 's' and 'w' bits. So your initial understanding of "there is a 32-bit instruction form (taking potential operand override prefixes) by setting the lowest bit" is correct.

Shall I update the comments to make this more clear?

Copy link
Member

@jakobbotsch jakobbotsch Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that explains it.
It would be great if you could expand the comments on HasWBit and HasSBit to include an explanation and perhaps point to the table in the Intel manual. We could also consider something more descriptive such as HasRegularWideForm/HasRegularWideImmediateForm with a pointer to this being the w/s bits in the encodings from that section in the Intel manual. But I will leave this up to you -- I don't have a strong opinion as long as it is explained how to interpret the return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more comments and renamed to HasRegularWideForm. Happy to adjust. Do the comments make it clear what is happening here with respect to the instruction needing the 'w' bit set IF it is in a form where the 'w' bit is present?

I think this is where further refactoring will help per.

Copy link
Member

@jakobbotsch jakobbotsch Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the comments explain much. I would add something like:

// Many x86/x64 instructions follow a regular encoding scheme where the
// byte-sized version of an instruction has the lowest bit of the opcode cleared
// while the 32-bit version of the instruction (taking potential prefixes to
// override operand size) has the lowest bit set. This function returns true if
// the instruction follows this format.
// Note that this bit is called `w` in the encoding table in Section B.2 of
// Volume 2 of the Intel Architecture Software Developer Manual.

and

// As above, many instructions taking immediates have a regular form used to
// encode whether the instruction takes a sign-extended 1-byte immediate or a
// (in 64-bit sign-extended) 4-byte immediate, by respectively setting and
// clearing the second lowest bit. This bit is called the s bit in table B.2.

for the other one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've gone ahead and updated the comments. Mostly copied what you pasted into the format in the file. Does it look good?

anthonycanino and others added 3 commits December 6, 2021 11:30
Change encodes 'w' and 's' in the insFlags struct and INS_FLAG
entry for xarch instruction table. In addition, `HasWBit` and `HasSBit`
check if this flag is set for an instruction, which allows to start
simplifying some of the various ad-hoc checks for these bits that were
previously done per-instruction throughout emitxarch.cpp.
Co-authored-by: Tanner Gooding <tagoo@outlook.com>
@jakobbotsch
Copy link
Member

The CI asmdiffs leg shows no diffs: https://dev.azure.com/dnceng/public/_build/results?buildId=1499327&view=ms.vss-build-web.run-extensions-tab
Let me run the fuzzers.

/azp run Antigen, Fuzzlyn

@jakobbotsch
Copy link
Member

/azp run Antigen, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member

The Fuzzlyn failures are known. @kunalspathak, can you look if the Antigen failures are known?

@kunalspathak
Copy link
Member

The Fuzzlyn failures are known. @kunalspathak, can you look if the Antigen failures are known?

Yes, all failures are known failures.

if ((size != EA_1BYTE) && (ins != INS_imul) && (ins != INS_bsf) && (ins != INS_bsr) && (!insIsCMOV(ins)) &&
!IsSSEInstruction(ins) && !IsAVXInstruction(ins))
// Anthony: I believe we may remove !insIsCMOV check, but leaving for comparison purposes with another
// similar check below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: Seems like the TODO above is outdated with this change, remove it?
Also, I would prefer to not leave author names in comments and to write them in third person or plural first person instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have removed the todo and my comment.

insIsCMOV(ins)) &&
size != EA_1BYTE)
// Anthony: Per L10986, why is this insIsCMOV (which does not have a w bit)
// and above is !insIsCMOV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say exactly why these checks are there, but since we currently do not generate cmov I would be fine with removing the checks and leave it up to future work to make sure the emitter supports them if we decide to start generating it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have gone ahead and removed the comments and the redundant cmov checks.

@jakobbotsch jakobbotsch merged commit 3424923 into dotnet:main Dec 18, 2021
@jakobbotsch
Copy link
Member

Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Jan 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants