-
Notifications
You must be signed in to change notification settings - Fork 150
Experimental Kernel Selection
This page is deprecated as of 2024-09-30 and will be removed for ROCm 6.4. New documentation is under active development.
Selecting the optimal kernel/solution for an arbitrary GEMM (or tensor contraction), is an area of active development. Experimental selection is provided to allow users early access to the latest developments. These methods are design to select GPU kernels in an intelligent way and obtain good performance.
Currently, there are two experimental libraries implemented.
The first one, assigns kernels dividing the GEMM space as a grid and assigning the kernels with the highest computational granularity.
It is available for:
- MI200
- FP16
- NN and NT transposition types.
The second one, selects high-performance kernels using a pre-trained decision tree.
It is available for:
- MI200
- FP16
- NN, NT and TN transposition types.
Experimental selection is enabled/disabled using the TENSILE_EXPERIMENTAL_SELECTION
environmental variable (see here for details).