Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Failure to merge X86ISD::CVTPH2PS nodes #83414

Closed
RKSimon opened this issue Feb 29, 2024 · 1 comment
Closed

[X86] Failure to merge X86ISD::CVTPH2PS nodes #83414

RKSimon opened this issue Feb 29, 2024 · 1 comment

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 29, 2024

define <4 x i32> @fptosi_2f16_to_4i32(<2 x half> %a) {
  %cvt = fptosi <2 x half> %a to <2 x i32>
  %ext = shufflevector <2 x i32> %cvt, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  ret <4 x i32> %ext
}

llc -mcpu=x86-64-v3

fptosi_2f16_to_4i32:                    # @fptosi_2f16_to_4i32
	vpshufb	.LCPI0_0(%rip), %xmm0, %xmm1    # xmm1 = xmm0[2,3],zero,zero,zero,zero,zero,zero,xmm0[u,u,u,u,u,u,u,u]
	vcvtph2ps	%xmm1, %xmm1
	vpmovzxwq	%xmm0, %xmm0            # xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	vcvtph2ps	%xmm0, %xmm0
	vunpcklps	%xmm1, %xmm0, %xmm0     # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	vcvttps2dq	%xmm0, %xmm0
	vmovq	%xmm0, %xmm0                    # xmm0 = xmm0[0],zero
	retq

Latest trunk now gives the above assembly, ideally we would only have a single vcvtph2ps node, and avoid all the shuffles which are just trying to move elements into the lowest element:

fptosi_2f16_to_4i32:                    # @fptosi_2f16_to_4i32
	vcvtph2ps	%xmm0, %xmm0
	vcvttps2dq	%xmm0, %xmm0
	vmovq	%xmm0, %xmm0                    # xmm0 = xmm0[0],zero
	retq
@llvmbot
Copy link
Member

llvmbot commented Feb 29, 2024

@llvm/issue-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

```ll define <4 x i32> @fptosi_2f16_to_4i32(<2 x half> %a) { %cvt = fptosi <2 x half> %a to <2 x i32> %ext = shufflevector <2 x i32> %cvt, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ret <4 x i32> %ext } ``` llc -mcpu=x86-64-v3 ```asm fptosi_2f16_to_4i32: # @fptosi_2f16_to_4i32 vpshufb .LCPI0_0(%rip), %xmm0, %xmm1 # xmm1 = xmm0[2,3],zero,zero,zero,zero,zero,zero,xmm0[u,u,u,u,u,u,u,u] vcvtph2ps %xmm1, %xmm1 vpmovzxwq %xmm0, %xmm0 # xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero vcvtph2ps %xmm0, %xmm0 vunpcklps %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] vcvttps2dq %xmm0, %xmm0 vmovq %xmm0, %xmm0 # xmm0 = xmm0[0],zero retq ``` Latest trunk now gives the above assembly, ideally we would only have a single vcvtph2ps node, and avoid all the shuffles which are just trying to move elements into the lowest element: ```asm fptosi_2f16_to_4i32: # @fptosi_2f16_to_4i32 vcvtph2ps %xmm0, %xmm0 vcvttps2dq %xmm0, %xmm0 vmovq %xmm0, %xmm0 # xmm0 = xmm0[0],zero retq ```

RKSimon added a commit that referenced this issue Feb 29, 2024
…TPS2PH handling

Allows us to peek through the F16 conversion nodes, mainly to simplify shuffles

An easy part of #83414
@RKSimon RKSimon self-assigned this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants