Skip to content

Conversation

@nicoboss
Copy link
Contributor

This fixes #9044

Sets ggml_sched_max_splits to be equal to graph_size as recommended by @slaren in #9044 (comment) since at most there is one split for each node in the graph.

Thanks to this change I was able to run GPU accelerated inference on BigLlama-3.1-681B-Instruct which prior to this change caused llama.cpp to crash.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 15, 2024
@slaren slaren merged commit e3f6fd5 into ggml-org:master Aug 16, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: GGML_SCHED_MAX_SPLITS must be increased to run BigLlama-3.1-681B-Instruct using GPU acceleration

2 participants