Skip to content

Conversation

@suhara
Copy link
Contributor

@suhara suhara commented Aug 17, 2024

Creating a separate PR for incorrect use of ctx_split for bias tensors, following the suggestions by @slaren in in #8922 .

Please see this message for details.

Citing @slaren's comment

ctx_split only makes a difference when using tensor parallelism with -sm row, which is only supported on the CUDA backend when using multiple GPUs. When using -sm row, ctx_split splits the rows of the matrix between the available GPUs. This is only supported for matrix multiplication, so it should only be used with the matrix portion of linear/dense layers. The other cases are also wrong and should be corrected as well, but it doesn't need to be done here.

As far as I see, there are four such lines, which have been fixed in this PR.

Special thanks to @slaren !

Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@slaren slaren merged commit 2fb9267 into ggml-org:master Aug 17, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants