CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 #8311

JohannesGaessler · 2024-07-04T22:34:44Z

Fixes #8254 . The problem as far as I can tell is that for determining the boundaries between CUDA blocks I implicitly assumed that the ne00/ne10 dimension of the matrices is a multiple of the number of values processed by MMQ in a single iteration (128 in this case). So for deepseek where ne00 is only a multiple of 64 the results were wrong.

Co-Authored-By: Johannes Gäßler <johannesg@5d6.de>

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0

54557c3

JohannesGaessler added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jul 4, 2024

github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Jul 4, 2024

JohannesGaessler mentioned this pull request Jul 4, 2024

Bug: Failed to load quantizied DeepSeek-V2-Lite-Chat model #8254

Closed

slaren approved these changes Jul 4, 2024

View reviewed changes

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 5, 2024

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 ggerganov#8311

3a9282f

Co-Authored-By: Johannes Gäßler <johannesg@5d6.de>

JohannesGaessler merged commit bcefa03 into ggerganov:master Jul 5, 2024
49 checks passed

Green-Sky mentioned this pull request Jul 5, 2024

CUDA: MMQ support for iq4_nl, iq4_xs #8278

Merged

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 6, 2024

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 ggerganov#8311

caf1240

Co-Authored-By: Johannes Gäßler <johannesg@5d6.de>

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 11, 2024

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 ggerganov#8311

0e3bf6e

Co-Authored-By: Johannes Gäßler <johannesg@5d6.de>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (ggerganov#8311)

214dd3d

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (ggerganov#8311)

972fbf7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 #8311

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 #8311

JohannesGaessler commented Jul 4, 2024

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 #8311

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 #8311

Conversation

JohannesGaessler commented Jul 4, 2024