Skip to content

Conversation

@Pierre-vh
Copy link
Contributor

No description provided.

@Pierre-vh
Copy link
Contributor Author

I did this after looking at #69324, where we emitted a couple of muls that were obviously foldable.
Now we also have a s_add that could be folded but it's emitted from a (bitcast (build_vector lo, hi)) and we can't look through bitcast/splat in the DAG apparently.

Again no sure how effective this is, i just thought it's a nice to have. We have one more instruction in the testcase but it's because we scalarized a bit more and needed to copy to a VGPR, so overall I think it improved it.

@github-actions
Copy link

github-actions bot commented Oct 18, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

; VI-NEXT: v_addc_u32_e32 v7, vcc, v5, v4, vcc
; VI-NEXT: v_mul_lo_u32 v4, v7, s6
; VI-NEXT: s_mov_b32 s4, 0x346d900
; VI-NEXT: s_add_u32 s4, 0x4237, s4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a plan to fold this add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, it looks like this:

      t420: v2i32 = BUILD_VECTOR Constant:i32<16951>, Constant:i32<0>
    t421: i64 = bitcast t420
      t414: v2i32 = BUILD_VECTOR Constant:i32<54974720>, Constant:i32<0>
    t415: i64 = bitcast t414
  t190: i64 = add t421, t415

We can't do a generic combine for the bitcast I think. I tried it and it caused loops. I think the easiest would be to make a DAG equivalent of the GISel helper that looks through splats and just returns the value directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to replace dyn_cast<ConstantSDNode> checks in SelectionDAG::FoldConstantArithmetic with ComputeKnownBits calls that looks for isConstant scalar values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the issue is here. Why can't we fold (bitcast (build_vector constant:i32, constant:i32)) -> constant:i64 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a ConstantFoldBITCASTofBUILD_VECTOR fold but it only runs before LegalizeTypes/Ops. Some targets need the bitcasts apparently. I tried doing it more often but only in specific cases (<=64 bit ints) but I still get infinite loops everywhere.

I think we need to add some helper to get splat values instead of checking ConstantSDNode when constant folding add. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it just be a matter of legality checks to avoid the loops?

@Pierre-vh Pierre-vh merged commit 2bc9358 into llvm:main Oct 24, 2023
@Pierre-vh Pierre-vh deleted the umul-fold branch October 24, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants