AMDGCN inefficient long add with constant #237

preda · 2024-12-20T21:21:04Z

Consider this OpenCL kernel:

 kernel void testAdd(global long* io) {
  long C = ((long) 1) << 50;
  io[get_global_id(0)] = C + io[get_global_id(0)];
}

This ISA is generated for the long add:

	v_mov_b32_e32 v4, 0x40000
	v_add_co_u32_e32 v2, vcc, 0, v2
	v_addc_co_u32_e32 v3, vcc, v3, v4, vcc

As you see, the above code is.. un-necessary. In particular, v_add_co_u32_e32 v2, vcc, 0, v2 does not change the value of v2, and can not produce carry-out.

The text was updated successfully, but these errors were encountered:

preda · 2024-12-20T21:24:32Z

Expected would be something like:

v_mov_b32_e32 v4, 0x40000
v_add_co_u32_e32 v3, vcc, v3, v4

preda · 2024-12-20T21:33:36Z

And a small observation: the same code is generated and the problem is easier to see if "long" is replaced with "unsigned long" in the sample kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGCN inefficient long add with constant #237

AMDGCN inefficient long add with constant #237

preda commented Dec 20, 2024

preda commented Dec 20, 2024

preda commented Dec 20, 2024

AMDGCN inefficient long add with constant #237

AMDGCN inefficient long add with constant #237

Comments

preda commented Dec 20, 2024

preda commented Dec 20, 2024

preda commented Dec 20, 2024