Skip to content

WIP: Complete removal or f16_kv, add offload_kqv field #1019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 18, 2023

Conversation

brandonrobertz
Copy link
Contributor

Adds the offload_kqv setting to the context settings fields struct and removes the f16_kv. Leaving f16_kv there caused a null pointer exception when attempting to use the embeddings example script:

$ python examples/high_level_api/high_level_api_embedding.py --model ../../llama/models/llama-2-7b.Q5_K_M.gguf

...
ValueError: NULL pointer access

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/brandon/src/llama-cpp-dev/llama-cpp-python/examples/high_level_api/high_level_api_embedding.py", line 11, in <module>
    print(llm.create_embedding("Hello world!"))
  File "/home/brandon/src/llama-cpp-dev/llama-cpp-python/llama_cpp/llama.py", line 1309, in create_embedding
    data.append(
SystemError: <method 'append' of 'list' objects> returned a result with an exception set

This addresses two GH issues:

@brandonrobertz brandonrobertz changed the title Complete removal or f16_kv, add offload_kqv field WIP: Complete removal or f16_kv, add offload_kqv field Dec 17, 2023
F16_KV appears to have been removed here: ggml-org/llama.cpp@af99c6f

This addresses two issues:

 - abetlen#995 which just requests to add the KV cache offloading param
 - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by
   leaving f16_kv in the fields struct)
@ringohoffman
Copy link

I have a question that is unrelated to your PR, which to me does seem to fix the issue in #1006.

What is the significance of the single embedding returned from llm.create_embedding("Hello world!")? Since it is composed of 4 tokens, I was expecting 4 embeddings to be returned, but len(llm.create_embedding("Hello world!")["data"]) == 1... and the data changes when you provide either "Hello world! or "Hello world"...

Is create_embedding working as expected?

@DavidMorton
Copy link

I patched this into my current system, and then the embeddings object would just tell me I'd have to create with embedding=True, and it was clearly in the code at the time. Not sure what happened, but I'm wondering if my build of llama_cpp is somehow borked.

Do older versions work alright as a workaround for this issue? (The one in #1006 )

@brandonrobertz
Copy link
Contributor Author

I have a question that is unrelated to your PR, which to me does seem to fix the issue in #1006.

What is the significance of the single embedding returned from llm.create_embedding("Hello world!")? Since it is composed of 4 tokens, I was expecting 4 embeddings to be returned, but len(llm.create_embedding("Hello world!")["data"]) == 1... and the data changes when you provide either "Hello world! or "Hello world"...

Is create_embedding working as expected?

It looks like the create_embedding method accept two different input types:

  • a string, which results in a single embedding
  • a list of strings, which results in a list of embeddings (one for each token in the list)

This just mirrors the behavior of the _LlamaModel.tokenize method basically. So the behavior doesn't appear to have changed.

@brandonrobertz
Copy link
Contributor Author

Do older versions work alright as a workaround for this issue? (The one in #1006 )

How old of llama-cpp-python are you using? This PR requires relatively new llama-cpp-python and llama.cpp (Dec 11, 2023 or newer basically).

@DavidMorton
Copy link

image

pip install is 0.2.23 (Dec 13th)

My local branch of llama.cpp's latest commit is slaren's from Dec 16th.

@brandonrobertz
Copy link
Contributor Author

image

pip install is 0.2.23 (Dec 13th)

My local branch of llama.cpp's latest commit is slaren's from Dec 16th.

Should be fine. Can you paste a traceback and the offending code?

@DavidMorton
Copy link

I managed to get it working. Cleared the pycache and re-ran.

Thanks!

@abetlen
Copy link
Owner

abetlen commented Dec 18, 2023

@brandonrobertz thank you so much for catching this, I'll merge this now and push a new version asap.

@abetlen abetlen merged commit 62944df into abetlen:main Dec 18, 2023
@brandonrobertz brandonrobertz deleted the fix-field-struct branch December 18, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants