llama : expose llama_model_n_head_kv in the API #11997

vlovich · 2025-02-21T07:46:13Z

It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).

If a library exists both in an added folder inside OUT_DIR and in the OS, prefer to use the one within OUT_DIR. Folders within OUT_DIR and folders outside OUT_DIR do not change their relative order between themselves. This is accomplished by sorting by whether we think the path is inside the search path or outside. ### What does this PR try to resolve? Fixes #15220. If a Rust crates builds a dynamic library & that same dynamic library is installed in the host OS, the result of the build's success & consistent behavior of executed tools depends on whether or not the user has the conflicting dynamic library in the external search path. If they do, then the host OS library will always be used which is unexpected - updates to your Rust dependency will still have you linking & running against an old host OS library (i.e. someone who doesn't have that library has a different silent behavior). ### How should we test and review this PR? This is what I did to verify my issue got resolved but I'm sure there's a simpler example one could construct. * Make sure Alsa and libllama.so are installed (on Arch I installed alsa-lib and llama.cpp-cuda). * Clone llama-cpp-2 & init llama.cpp submodule & update the submodule to point to ggml-org/llama.cpp#11997 instead. * Add plumbing to expose the new method within llama-cpp-2 as a public facing function on the LlamaModel struct (it's basically the same code as for n_head, just calling n_head_kv from llama.cpp). * Add cpal as a dependency in crate "foo" * Add llama-cpp-2 via path as a dependency in crate "foo" and enable the `dynamic-link` feature. * Add code using the newly expose n_head_kv method in crate "foo" in main.rs. NOTE: Code just needs to compile & be exported, doesn't have to be correct (fn main is probably easiest. * Add some basic code that tries to initialize cpal in crate "foo" in fn main. * Try to build / run crate "foo" Before my change, it fails with a linker error saying it can't find `llama_model_n_head_kv` because /usr/lib appears in the search path before the directory that contains the libllama.so that was built internally by the crate. This is because cpal depends on alsa-sys which uses pkg-config which adds /usr/lib to the search path before the llama-cpp-sys-2 build.rs is run. ### Additional information I'm not sure how to add tests so open to some help on that. I wanted to make sure that this approach is even correct. I coded this to change Cargo minimally and defensively since I don't know the internals of Cargo very well (e.g. I don't know if I have to compare against both `script_out_dir` / `script_out_dir_when_generated` since I don't know the difference & there's not really any explanation on what they are). It's possible this over-complicates the implementation so open to any feedback. Additionally, the sort that happens prior to each build up of the rustc environment is not where I'd ideally place it. I think it would be more efficient to have the list of search paths be free-floating and not tied to a BuildOutput so that they could be kept updated live & resorted only on insertion (since it's changed less frequently than rustc is invoked). Additionally, the generalized sort is correct but pessimistic - maintaining the list sorted could be done efficiently with some minor book keeping (i.e. you'd only need to sort the new paths & then could quickly inject into the middle of a VecDeque). And of course in terms of correctness, I didn't do a thorough job testing across all possible platforms. From first principles this seems directionally correct but it's always possible this breaks someone else's workflow. I'm also uneasy that the relative position of `-L` / `-l` arguments changes in this PR & I'm not sure if that's observable behavior or not (i.e. it used to be -L for a crate followed by `-l` for a crate), but now it's `-L` for all crates, still grouped by crated internally, followed by `-l` by crate).

It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).

llama : expose llama_model_n_head_kv in the API

a47d19f

It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).

ggerganov approved these changes Feb 21, 2025

View reviewed changes

This was referenced Feb 21, 2025

Unsatisfied link symbol for llama_model_head_kv in shared library utilityai/llama-cpp-rs#667

Open

Make sure search paths inside OUT_DIR precede external paths rust-lang/cargo#15221

Merged

ggerganov merged commit 3e9a286 into ggml-org:master Feb 25, 2025
46 checks passed

vlovich deleted the expose-n-head-kv branch March 1, 2025 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : expose llama_model_n_head_kv in the API #11997

llama : expose llama_model_n_head_kv in the API #11997

Uh oh!

vlovich commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

llama : expose llama_model_n_head_kv in the API #11997

llama : expose llama_model_n_head_kv in the API #11997

Uh oh!

Conversation

vlovich commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!