musa: refine compute capability #12493

yeahdongcn · 2025-03-21T09:01:59Z

Make sure to read the contributing guidelines before submitting a PR

This PR improves the handling of compute capabilities for MUSA devices with the following updates:

Adjusted Compute Capability Offset
- GGML_CUDA_CC_OFFSET_MTHREADS is now positioned between NVIDIA and AMD.
Boundary Check Update
- Updated the boundary check for NVIDIA compute capability tests using !GGML_CUDA_CC_IS_MTHREADS(cc).
Preserved Feature Availability Checks
- Ensured that NVIDIA and AMD feature availability tests remain unchanged.

Testing Done

./build/bin/test-backend-ops
./build/bin/llama-cli -m ~/models/deepseek-r1_7b_q4_0.gguf -ngl 999

# ./build/bin/test-backend-ops
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 MUSA devices:
  Device 0: MTT S80, compute capability 2.1, VMM: yes
Testing 2 devices

Backend 1/2: MUSA0
  Device description: MTT S80
  Device memory: 16297 MB (16292 MB free)

  ABS(type=f16,ne_a=[128,2,2,2],v=0): �[1;32mOK�[0m
  ...
  4634/4634 tests passed
  Backend MUSA0: �[1;32mOK�[0m

Backend 2/2: CPU
  Skipping CPU backend
2/2 backends passed
�[1;32mOK�[0m

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn · 2025-03-21T09:19:04Z

This is the initial PR. @fishingfly and I will collaborate to evaluate the features on MTT S80 and MTT S4000.

ggml/src/ggml-cuda/common.cuh

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn · 2025-03-21T15:33:30Z

I re-ran the tests, and all passed.

yeahdongcn · 2025-03-22T04:55:33Z

Hi @JohannesGaessler Do you know how to retrigger the CI without pushing or force-pushing? Thanks.

JohannesGaessler · 2025-03-22T08:56:56Z

There's a button you can press as a collaborator.

musa: refine compute capability

720425f

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn requested a review from JohannesGaessler as a code owner March 21, 2025 09:01

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 21, 2025

JohannesGaessler reviewed Mar 21, 2025

View reviewed changes

ggml/src/ggml-cuda/common.cuh Show resolved Hide resolved

ggml/src/ggml-cuda/common.cuh Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/common.cuh Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/common.cuh Outdated Show resolved Hide resolved

Address review comments

7fcde9e

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn force-pushed the xd/cc branch from fecf82c to 7fcde9e Compare March 21, 2025 15:21

JohannesGaessler approved these changes Mar 21, 2025

View reviewed changes

JohannesGaessler merged commit fac63a3 into ggml-org:master Mar 22, 2025
89 of 90 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

musa: refine compute capability #12493

musa: refine compute capability #12493

Uh oh!

yeahdongcn commented Mar 21, 2025

Uh oh!

yeahdongcn commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeahdongcn commented Mar 21, 2025

Uh oh!

yeahdongcn commented Mar 22, 2025

Uh oh!

JohannesGaessler commented Mar 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

musa: refine compute capability #12493

musa: refine compute capability #12493

Uh oh!

Conversation

yeahdongcn commented Mar 21, 2025

Testing Done

Uh oh!

yeahdongcn commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeahdongcn commented Mar 21, 2025

Uh oh!

yeahdongcn commented Mar 22, 2025

Uh oh!

JohannesGaessler commented Mar 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants