-
Notifications
You must be signed in to change notification settings - Fork 11.4k
llama : expose llama_model_n_head_kv in the API #11997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).
ggerganov
approved these changes
Feb 21, 2025
This was referenced Feb 21, 2025
orca-zhang
pushed a commit
to orca-zhang/llama.cpp
that referenced
this pull request
Feb 26, 2025
It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).
mglambda
pushed a commit
to mglambda/llama.cpp
that referenced
this pull request
Mar 8, 2025
It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).
arthw
pushed a commit
to arthw/llama.cpp
that referenced
this pull request
Mar 19, 2025
It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).
github-merge-queue bot
pushed a commit
to rust-lang/cargo
that referenced
this pull request
Apr 12, 2025
If a library exists both in an added folder inside OUT_DIR and in the OS, prefer to use the one within OUT_DIR. Folders within OUT_DIR and folders outside OUT_DIR do not change their relative order between themselves. This is accomplished by sorting by whether we think the path is inside the search path or outside. ### What does this PR try to resolve? Fixes #15220. If a Rust crates builds a dynamic library & that same dynamic library is installed in the host OS, the result of the build's success & consistent behavior of executed tools depends on whether or not the user has the conflicting dynamic library in the external search path. If they do, then the host OS library will always be used which is unexpected - updates to your Rust dependency will still have you linking & running against an old host OS library (i.e. someone who doesn't have that library has a different silent behavior). ### How should we test and review this PR? This is what I did to verify my issue got resolved but I'm sure there's a simpler example one could construct. * Make sure Alsa and libllama.so are installed (on Arch I installed alsa-lib and llama.cpp-cuda). * Clone llama-cpp-2 & init llama.cpp submodule & update the submodule to point to ggml-org/llama.cpp#11997 instead. * Add plumbing to expose the new method within llama-cpp-2 as a public facing function on the LlamaModel struct (it's basically the same code as for n_head, just calling n_head_kv from llama.cpp). * Add cpal as a dependency in crate "foo" * Add llama-cpp-2 via path as a dependency in crate "foo" and enable the `dynamic-link` feature. * Add code using the newly expose n_head_kv method in crate "foo" in main.rs. NOTE: Code just needs to compile & be exported, doesn't have to be correct (fn main is probably easiest. * Add some basic code that tries to initialize cpal in crate "foo" in fn main. * Try to build / run crate "foo" Before my change, it fails with a linker error saying it can't find `llama_model_n_head_kv` because /usr/lib appears in the search path before the directory that contains the libllama.so that was built internally by the crate. This is because cpal depends on alsa-sys which uses pkg-config which adds /usr/lib to the search path before the llama-cpp-sys-2 build.rs is run. ### Additional information I'm not sure how to add tests so open to some help on that. I wanted to make sure that this approach is even correct. I coded this to change Cargo minimally and defensively since I don't know the internals of Cargo very well (e.g. I don't know if I have to compare against both `script_out_dir` / `script_out_dir_when_generated` since I don't know the difference & there's not really any explanation on what they are). It's possible this over-complicates the implementation so open to any feedback. Additionally, the sort that happens prior to each build up of the rustc environment is not where I'd ideally place it. I think it would be more efficient to have the list of search paths be free-floating and not tied to a BuildOutput so that they could be kept updated live & resorted only on insertion (since it's changed less frequently than rustc is invoked). Additionally, the generalized sort is correct but pessimistic - maintaining the list sorted could be done efficiently with some minor book keeping (i.e. you'd only need to sort the new paths & then could quickly inject into the middle of a VecDeque). And of course in terms of correctness, I didn't do a thorough job testing across all possible platforms. From first principles this seems directionally correct but it's always possible this breaks someone else's workflow. I'm also uneasy that the relative position of `-L` / `-l` arguments changes in this PR & I'm not sure if that's observable behavior or not (i.e. it used to be -L for a crate followed by `-l` for a crate), but now it's `-L` for all crates, still grouped by crated internally, followed by `-l` by crate).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).