Skip to content

Experimental Kernel Selection

Braden Stefanuk edited this page Sep 30, 2024 · 8 revisions

This page is deprecated as of 2024-09-30 and will be removed for ROCm 6.4. New documentation is under active development.

Selecting the optimal kernel/solution for an arbitrary GEMM (or tensor contraction), is an area of active development. Experimental selection is provided to allow users early access to the latest developments. These methods are design to select GPU kernels in an intelligent way and obtain good performance.

Currently, there are two experimental libraries implemented.

The first one, assigns kernels dividing the GEMM space as a grid and assigning the kernels with the highest computational granularity.

It is available for:

  • MI200
  • FP16
  • NN and NT transposition types.

The second one, selects high-performance kernels using a pre-trained decision tree.

It is available for:

  • MI200
  • FP16
  • NN, NT and TN transposition types.

Experimental selection is enabled/disabled using the TENSILE_EXPERIMENTAL_SELECTION environmental variable (see here for details).

Clone this wiki locally