Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that GT_CNS_VEC is handled in LinearScan::isMatchingConstant #70171

Merged
merged 3 commits into from
Jun 3, 2022

Conversation

tannergooding
Copy link
Member

This is a small follow up to #68874 but provides decent improvements, particularly where the same constant (Zero and AllBitsSet in particular) is repeatedly used.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2022
@ghost ghost assigned tannergooding Jun 2, 2022
@ghost
Copy link

ghost commented Jun 2, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a small follow up to #68874 but provides decent improvements, particularly where the same constant (Zero and AllBitsSet in particular) is repeatedly used.

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and easy change.

@tannergooding
Copy link
Member Author

Example diffs

Note that ymm7 is set to "all bits set" first, and then reused on the last line rather than also using ymm6

        vpcmpeqd ymm7, ymm7, ymm7
        vxorps   ymm6, ymm7, ymm6
        vcmpps   ymm4, ymm4, ymm0, 0
        vorps    ymm4, ymm4, ymm6
-       vpcmpeqd ymm6, ymm6, ymm6
-       vpcmpeqd ymm4, ymm4, ymm6
+       vpcmpeqd ymm4, ymm4, ymm7

Note how ymm2 isn't reloaded with reloc @RDW00 because it was already in ymm2:

        vmovupd  ymm2, ymmword ptr[reloc @RWD00]
        vpshufb  ymm0, ymm0, ymm2
        vperm2i128 ymm0, ymm0, ymm0, 1
-       vmovupd  ymm2, ymmword ptr[reloc @RWD00]
        vpshufb  ymm1, ymm1, ymm2

@@ -147,6 +147,7 @@ int LinearScan::BuildNode(GenTree* tree)
case GT_CNS_INT:
case GT_CNS_LNG:
case GT_CNS_DBL:
case GT_CNS_VEC:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting ARM64 already had a path here since it needed special handling for Zero and AllBitsSet, it just wasn't getting used in LSRA

@tannergooding
Copy link
Member Author

@kunalspathak could you take a look? Its super small (8 lines) but touches LSRA so I'd appreciate your sign-off as well.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Egor Bogatov <egorbo@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants