Skip to content

llama : expose llama_model_n_head_kv in the API #11997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 25, 2025

Conversation

vlovich
Copy link
Contributor

@vlovich vlovich commented Feb 21, 2025

It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).

It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).
@ggerganov ggerganov merged commit 3e9a286 into ggml-org:master Feb 25, 2025
46 checks passed
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).
@vlovich vlovich deleted the expose-n-head-kv branch March 1, 2025 06:46
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).
github-merge-queue bot pushed a commit to rust-lang/cargo that referenced this pull request Apr 12, 2025
If a library exists both in an added folder inside OUT_DIR and in the
OS, prefer to use the one within OUT_DIR. Folders within OUT_DIR and
folders outside OUT_DIR do not change their relative order between
themselves.

This is accomplished by sorting by whether we think the path is inside
the search path or outside.

### What does this PR try to resolve?

Fixes #15220. If a Rust crates builds a dynamic library & that same
dynamic library is installed in the host OS, the result of the build's
success & consistent behavior of executed tools depends on whether or
not the user has the conflicting dynamic library in the external search
path. If they do, then the host OS library will always be used which is
unexpected - updates to your Rust dependency will still have you linking
& running against an old host OS library (i.e. someone who doesn't have
that library has a different silent behavior).

### How should we test and review this PR?

This is what I did to verify my issue got resolved but I'm sure there's
a simpler example one could construct.

* Make sure Alsa and libllama.so are installed (on Arch I installed
alsa-lib and llama.cpp-cuda).
* Clone llama-cpp-2 & init llama.cpp submodule & update the submodule to
point to ggml-org/llama.cpp#11997 instead.
* Add plumbing to expose the new method within llama-cpp-2 as a public
facing function on the LlamaModel struct (it's basically the same code
as for n_head, just calling n_head_kv from llama.cpp).
* Add cpal as a dependency in crate "foo"
* Add llama-cpp-2 via path as a dependency in crate "foo" and enable the
`dynamic-link` feature.
* Add code using the newly expose n_head_kv method in crate "foo" in
main.rs. NOTE: Code just needs to compile & be exported, doesn't have to
be correct (fn main is probably easiest.
* Add some basic code that tries to initialize cpal in crate "foo" in fn
main.
* Try to build / run crate "foo"

Before my change, it fails with a linker error saying it can't find
`llama_model_n_head_kv` because /usr/lib appears in the search path
before the directory that contains the libllama.so that was built
internally by the crate. This is because cpal depends on alsa-sys which
uses pkg-config which adds /usr/lib to the search path before the
llama-cpp-sys-2 build.rs is run.

### Additional information

I'm not sure how to add tests so open to some help on that. I wanted to
make sure that this approach is even correct. I coded this to change
Cargo minimally and defensively since I don't know the internals of
Cargo very well (e.g. I don't know if I have to compare against both
`script_out_dir` / `script_out_dir_when_generated` since I don't know
the difference & there's not really any explanation on what they are).

It's possible this over-complicates the implementation so open to any
feedback. Additionally, the sort that happens prior to each build up of
the rustc environment is not where I'd ideally place it. I think it
would be more efficient to have the list of search paths be
free-floating and not tied to a BuildOutput so that they could be kept
updated live & resorted only on insertion (since it's changed less
frequently than rustc is invoked). Additionally, the generalized sort is
correct but pessimistic - maintaining the list sorted could be done
efficiently with some minor book keeping (i.e. you'd only need to sort
the new paths & then could quickly inject into the middle of a
VecDeque).

And of course in terms of correctness, I didn't do a thorough job
testing across all possible platforms. From first principles this seems
directionally correct but it's always possible this breaks someone
else's workflow. I'm also uneasy that the relative position of `-L` /
`-l` arguments changes in this PR & I'm not sure if that's observable
behavior or not (i.e. it used to be -L for a crate followed by `-l` for
a crate), but now it's `-L` for all crates, still grouped by crated
internally, followed by `-l` by crate).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants