Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGCN inefficient long add with constant #237

Open
preda opened this issue Dec 20, 2024 · 2 comments
Open

AMDGCN inefficient long add with constant #237

preda opened this issue Dec 20, 2024 · 2 comments

Comments

@preda
Copy link

preda commented Dec 20, 2024

Consider this OpenCL kernel:

 kernel void testAdd(global long* io) {
  long C = ((long) 1) << 50;
  io[get_global_id(0)] = C + io[get_global_id(0)];
}

This ISA is generated for the long add:

	v_mov_b32_e32 v4, 0x40000
	v_add_co_u32_e32 v2, vcc, 0, v2
	v_addc_co_u32_e32 v3, vcc, v3, v4, vcc

As you see, the above code is.. un-necessary. In particular, v_add_co_u32_e32 v2, vcc, 0, v2 does not change the value of v2, and can not produce carry-out.

@preda
Copy link
Author

preda commented Dec 20, 2024

Expected would be something like:

v_mov_b32_e32 v4, 0x40000
v_add_co_u32_e32 v3, vcc, v3, v4

@preda
Copy link
Author

preda commented Dec 20, 2024

And a small observation: the same code is generated and the problem is easier to see if "long" is replaced with "unsigned long" in the sample kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant