Add support for new gfx1200 and gfx1201 targets #12372

slojosic-amd · 2025-03-13T15:08:49Z

No description provided.

fjankovi · 2025-03-13T15:40:16Z

CC: @powderluv

slojosic-amd · 2025-03-13T15:44:51Z

@JohannesGaessler Could you please update the labels because I don't have correct permissions for that:

GraphQL: slojosic-amd does not have the correct permissions to execute AddLabelsToLabelable (addLabelsToLabelable)

IMbackK · 2025-03-16T18:24:05Z

docs/build.md

@@ -189,7 +189,7 @@ The following compilation options are also available to tweak performance:

 | Option                        | Legal values           | Default | Description                                                                                                                                                                                                                                                                             |
 |-------------------------------|------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| GGML_CUDA_FORCE_MMQ           | Boolean                | false   | Force the use of custom matrix multiplication kernels for quantized models instead of FP16 cuBLAS even if there is no int8 tensor core implementation available (affects V100, RDNA3). MMQ kernels are enabled by default on GPUs with int8 tensor core support. With MMQ force enabled, speed for large batch sizes will be worse but VRAM consumption will be lower.                       |
+| GGML_CUDA_FORCE_MMQ           | Boolean                | false   | Force the use of custom matrix multiplication kernels for quantized models instead of FP16 cuBLAS even if there is no int8 tensor core implementation available (affects V100, RDNA3, RDNA4). MMQ kernels are enabled by default on GPUs with int8 tensor core support. With MMQ force enabled, speed for large batch sizes will be worse but VRAM consumption will be lower.                       |


CDNA too, maybe condense as V100, CDNA and RDNA3+

IMbackK · 2025-03-16T18:24:39Z

ggml/src/ggml-cuda/common.cuh

 #define GGML_CUDA_CC_RDNA1      (GGML_CUDA_CC_OFFSET_AMD + 0x1010) // RX 5000
 #define GGML_CUDA_CC_RDNA2      (GGML_CUDA_CC_OFFSET_AMD + 0x1030) // RX 6000, minimum for dp4a
 #define GGML_CUDA_CC_RDNA3      (GGML_CUDA_CC_OFFSET_AMD + 0x1100) // RX 7000, minimum for WMMA
+#define GGML_CUDA_CC_RDNA4      (GGML_CUDA_CC_OFFSET_AMD + 0x1200) // RX 9000


If you want to add RDNA4 you need to also change GGML_CUDA_CC_IS_RDNA3 to not match RDNA4

IMbackK · 2025-03-16T18:26:23Z

ggml/src/ggml-cuda/ggml-cuda.cu

        cu_compute_type = CUBLAS_COMPUTE_32F;
        alpha = &alpha_f32;
        beta  = &beta_f32;
+
+        if (GGML_CUDA_CC_IS_RDNA4(compute_capability)) {


So you test for RDNA4 in a branch that tests for CDNA, makes no sense.

IMbackK · 2025-03-16T18:31:29Z

ggml/src/ggml-cuda/ggml-cuda.cu

@@ -1214,7 +1214,7 @@ static void ggml_cuda_op_mul_mat_cublas(

        CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

-        if (GGML_CUDA_CC_IS_CDNA(compute_capability)) {
+        if (GGML_CUDA_CC_IS_CDNA(compute_capability) || GGML_CUDA_CC_IS_RDNA4(compute_capability)) {


If V_WMMA_F32_16X16X16_F16 dose better here than V_WMMA_F16_16X16X16_F16 on rdna4 it stands to reason that it dose on rdna3 too.

slojosic-amd added 2 commits March 13, 2025 06:50

HIP: Add support for new gfx1200 and gfx1201 targets

2d7a1f9

HIP: Avoid fp32->fp16->fp32 conversion on RDNA4

f2872aa

slojosic-amd requested a review from JohannesGaessler as a code owner March 13, 2025 15:08

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 13, 2025

JohannesGaessler requested a review from IMbackK March 16, 2025 14:05

IMbackK requested changes Mar 16, 2025

View reviewed changes

GDsouza mentioned this pull request Mar 18, 2025

AMD RX9070/9070XT support ollama/ollama#9812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for new gfx1200 and gfx1201 targets #12372

Add support for new gfx1200 and gfx1201 targets #12372

slojosic-amd commented Mar 13, 2025

fjankovi commented Mar 13, 2025

slojosic-amd commented Mar 13, 2025

IMbackK Mar 16, 2025

IMbackK Mar 16, 2025 •

edited

Loading

IMbackK Mar 16, 2025

IMbackK Mar 16, 2025

Add support for new gfx1200 and gfx1201 targets #12372

Are you sure you want to change the base?

Add support for new gfx1200 and gfx1201 targets #12372

Conversation

slojosic-amd commented Mar 13, 2025

fjankovi commented Mar 13, 2025

slojosic-amd commented Mar 13, 2025

IMbackK Mar 16, 2025

Choose a reason for hiding this comment

IMbackK Mar 16, 2025 • edited Loading

Choose a reason for hiding this comment

IMbackK Mar 16, 2025

Choose a reason for hiding this comment

IMbackK Mar 16, 2025

Choose a reason for hiding this comment

IMbackK Mar 16, 2025 •

edited

Loading