Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/opcodesextra: fix actions for VNNI and VBMI2 shifts #372

Merged
merged 3 commits into from
Mar 7, 2023

Conversation

vsivsi
Copy link
Contributor

@vsivsi vsivsi commented Feb 20, 2023

The VBMI2 concatenate and variable shift instructions VPSH{L,R}DV{W,D,Q} always both read and write the destination register in the "Op 1" position.

See:
https://www.felixcloutier.com/x86/vpshldv
https://www.felixcloutier.com/x86/vpshrdv

In cases without mask merging, Avo has defined these instructions with the incorrect destination register action (inst.W), which should always instead be inst.RW. Note: this differs from the immediate (non variable shift) versions of these instructions, which do not (absent merge masking) read their Op 1 registers.

This bug manifests as incorrect vector register scheduling when Avo doesn't recognize that these instructions have a data dependency on the destination register in Op1.

The fix is to parameterize the _yvblendmpd function defined in opcodesextra to require specifying the destination register action. This flexibility is needed because this function is shared between the VBMI2 instructions and the VNNI instructions, which do not read their Op1 registers (absent merge masking).

I believe the confusion here was originally my fault in the original PR including VBMI2 support, for not recognizing that the Golang AVX optab definitions are actually silent on whether destination register values are read in addition to being written (because the assembler doesn't need to care).

In cases without a mask register, these concatenate
and variable shift instructions were being defined with
the wrong destination register action, which is always
inst.RW. This differs from the immediate (non variable)
versions of these instructions, which do not overwrite
their outputs (absent merge masking).
@mmcloughlin
Copy link
Owner

The fix is to parameterize the _yvblendmpd function defined in opcodesextra to require specifying the destination register action. This flexibility is needed because this function is shared between the VBMI2 instructions and the VNNI instructions, which do not read their Op1 registers (absent merge masking).

Are we sure this doesn't apply to VNNI too? The pseudocode in https://www.felixcloutier.com/x86/vpdpbusd at least appears to suggest that merge masking is an option and does read ORIGDEST

(KL,VL)=(4,128), (8,256), (16,512)
ORIGDEST := DEST
FOR i := 0 TO KL-1:
    IF k1[i] or *no writemask*:
        // Byte elements of SRC1 are zero-extended to 16b and
        // byte elements of SRC2 are sign extended to 16b before multiplication.
        IF SRC2 is memory and EVEX.b == 1:
            t := SRC2.dword[0]
        ELSE:
            t := SRC2.dword[i]
        p1word := ZERO_EXTEND(SRC1.byte[4*i]) * SIGN_EXTEND(t.byte[0])
        p2word := ZERO_EXTEND(SRC1.byte[4*i+1]) * SIGN_EXTEND(t.byte[1])
        p3word := ZERO_EXTEND(SRC1.byte[4*i+2]) * SIGN_EXTEND(t.byte[2])
        p4word := ZERO_EXTEND(SRC1.byte[4*i+3]) * SIGN_EXTEND(t.byte[3])
        DEST.dword[i] := ORIGDEST.dword[i] + p1word + p2word + p3word + p4word
    ELSE IF *zeroing*:
        DEST.dword[i] := 0
    ELSE: // Merge masking, dest element unchanged
        DEST.dword[i] := ORIGDEST.dword[i]
DEST[MAX_VL-1:VL] := 0

I believe the confusion here was originally my fault in the original PR including VBMI2 support, for not recognizing that the Golang AVX optab definitions are actually silent on whether destination register values are read in addition to being written (because the assembler doesn't need to care).

This could equally well have been my fault. I did a fair amount of reshuffling.

@vsivsi
Copy link
Contributor Author

vsivsi commented Feb 20, 2023

Good catch, you are correct. In which case the fix is even simpler. I'll push the change momentarily.

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2023

Codecov Report

Merging #372 (ee56c43) into master (fb97a21) will not change coverage.
The diff coverage is n/a.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##           master     #372   +/-   ##
=======================================
  Coverage   76.05%   76.05%           
=======================================
  Files          65       65           
  Lines       21055    21055           
=======================================
  Hits        16013    16013           
  Misses       4959     4959           
  Partials       83       83           
Flag Coverage Δ
integration 11.70% <ø> (ø)
stress ?
unittests 73.19% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
x86/zoptab.go 92.42% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@vsivsi
Copy link
Contributor Author

vsivsi commented Feb 20, 2023

Okay, PR now reflects your correct observation that VNNI instructions also always read the destination.

I do believe that continuing to call the form helper function _yvblendmpd but hard-coding the destination register action to be inst.RW, we may be laying a bit of a future trap for the unwary (probably ourselves). This is because the eponymous VPBLENDM* functions in AVX512 do not typically read their destinations, e.g.
https://www.felixcloutier.com/x86/vpblendmd:vpblendmq

Leaving it parameterized might be a better signal that one needs to actually pay attention to this detail!

@vsivsi vsivsi changed the title Fix destination register action for VBMI2 VPSH{L,R}DV{W,D,Q} instructions Fix destination register action for VNNI & VBMI2 VPSH{L,R}DV{W,D,Q} instructions Feb 20, 2023
mmcloughlin added a commit that referenced this pull request Mar 6, 2023
In contributor PRs such as #372 we see that the pr/autoland job fails
due to permissions issues obtaining an app token from a fork.

This job is actually only required to auto-land bot PRs. Instead of
skipping just the auto-land setting step, we should be skipping the
entire thing.

Fixes #381
@mmcloughlin
Copy link
Owner

Sorry I've been trying to figure out how to rebase this PR on the latest changes (mostly to resolve #381), but failing to do so 😓 It seems like it's possible but I think I'd have to mess with your fork.

This looks good. Please could you update it to the latest master and we should be good to go.

Thanks again!

@vsivsi
Copy link
Contributor Author

vsivsi commented Mar 6, 2023

Thanks so much for fixing the PR CI failures! \o/

This PR is now caught up with master.

While you are here, can we get this one merged too? #233

It's been buried by newer issues, but I've been using it forever without problems and it would be great to get it merged.

It's also now all caught up with master.

@mmcloughlin mmcloughlin changed the title Fix destination register action for VNNI & VBMI2 VPSH{L,R}DV{W,D,Q} instructions internal/opcodesextra: fix register actions for VNNI and VBMI2 VPSH{L,R}DV{W,D,Q} Mar 7, 2023
@mmcloughlin mmcloughlin changed the title internal/opcodesextra: fix register actions for VNNI and VBMI2 VPSH{L,R}DV{W,D,Q} internal/opcodesextra: fix actions for VNNI and VBMI2 shifts Mar 7, 2023
@mmcloughlin mmcloughlin merged commit fc7bbb8 into mmcloughlin:master Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants