Skip to content

Commit

Permalink
Document hard won information learned on how to thwart the rocm compi…
Browse files Browse the repository at this point in the history
…ler's excessive

use of VGPRs when unrolling FFT_width.
  • Loading branch information
gwoltman authored and preda committed Dec 29, 2024
1 parent 3bd26a2 commit e27a797
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/cl/carryfused.cl
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ KERNEL(G_W) carryFused(P(T2) out, CP(T2) in, u32 posROE, P(i64) carryShuttle, P(
#undef GPW
#endif

// Try this weird FFT_width call that adds a "hidden zero" when unrolling. This prevents the compiler from finding
// common sub-expressions to re-use in the second fft_WIDTH call. Re-using this data requires dozens of VGPRs
// which causes a terrible reduction in occupancy.
// fft_WIDTH(lds + (get_group_id(0) / 131072), u, smallTrig + (get_group_id(0) / 131072));
fft_WIDTH(lds, u, smallTrig);

Word2 wu[NW];
Expand Down

0 comments on commit e27a797

Please sign in to comment.