WIP: Complete removal or f16_kv, add offload_kqv field #1019

brandonrobertz · 2023-12-17T04:33:21Z

Adds the offload_kqv setting to the context settings fields struct and removes the f16_kv. Leaving f16_kv there caused a null pointer exception when attempting to use the embeddings example script:

$ python examples/high_level_api/high_level_api_embedding.py --model ../../llama/models/llama-2-7b.Q5_K_M.gguf

...
ValueError: NULL pointer access

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/brandon/src/llama-cpp-dev/llama-cpp-python/examples/high_level_api/high_level_api_embedding.py", line 11, in <module>
    print(llm.create_embedding("Hello world!"))
  File "/home/brandon/src/llama-cpp-dev/llama-cpp-python/llama_cpp/llama.py", line 1309, in create_embedding
    data.append(
SystemError: <method 'append' of 'list' objects> returned a result with an exception set

This addresses two GH issues:

Add support for new KV Cache Offloading API #995 which just requests to add the KV cache offloading param
“<method 'append' of 'list' objects> returned a result with an exception set” #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)

F16_KV appears to have been removed here: ggml-org/llama.cpp@af99c6f This addresses two issues: - abetlen#995 which just requests to add the KV cache offloading param - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)

ringohoffman · 2023-12-18T00:18:26Z

I have a question that is unrelated to your PR, which to me does seem to fix the issue in #1006.

What is the significance of the single embedding returned from llm.create_embedding("Hello world!")? Since it is composed of 4 tokens, I was expecting 4 embeddings to be returned, but len(llm.create_embedding("Hello world!")["data"]) == 1... and the data changes when you provide either "Hello world! or "Hello world"...

Is create_embedding working as expected?

DavidMorton · 2023-12-18T02:09:02Z

I patched this into my current system, and then the embeddings object would just tell me I'd have to create with embedding=True, and it was clearly in the code at the time. Not sure what happened, but I'm wondering if my build of llama_cpp is somehow borked.

Do older versions work alright as a workaround for this issue? (The one in #1006 )

brandonrobertz · 2023-12-18T03:28:55Z

I have a question that is unrelated to your PR, which to me does seem to fix the issue in #1006.

What is the significance of the single embedding returned from llm.create_embedding("Hello world!")? Since it is composed of 4 tokens, I was expecting 4 embeddings to be returned, but len(llm.create_embedding("Hello world!")["data"]) == 1... and the data changes when you provide either "Hello world! or "Hello world"...

Is create_embedding working as expected?

It looks like the create_embedding method accept two different input types:

a string, which results in a single embedding
a list of strings, which results in a list of embeddings (one for each token in the list)

This just mirrors the behavior of the _LlamaModel.tokenize method basically. So the behavior doesn't appear to have changed.

brandonrobertz · 2023-12-18T03:37:55Z

Do older versions work alright as a workaround for this issue? (The one in #1006 )

How old of llama-cpp-python are you using? This PR requires relatively new llama-cpp-python and llama.cpp (Dec 11, 2023 or newer basically).

DavidMorton · 2023-12-18T04:12:15Z

pip install is 0.2.23 (Dec 13th)

My local branch of llama.cpp's latest commit is slaren's from Dec 16th.

brandonrobertz · 2023-12-18T05:18:28Z

pip install is 0.2.23 (Dec 13th)

My local branch of llama.cpp's latest commit is slaren's from Dec 16th.

Should be fine. Can you paste a traceback and the offending code?

DavidMorton · 2023-12-18T13:37:51Z

I managed to get it working. Cleared the pycache and re-ran.

Thanks!

abetlen · 2023-12-18T19:26:46Z

@brandonrobertz thank you so much for catching this, I'll merge this now and push a new version asap.

brandonrobertz mentioned this pull request Dec 17, 2023

“<method 'append' of 'list' objects> returned a result with an exception set” #1006

Closed

brandonrobertz force-pushed the fix-field-struct branch from 905e864 to 9cdfe93 Compare December 17, 2023 05:05

brandonrobertz changed the title ~~Complete removal or f16_kv, add offload_kqv field~~ WIP: Complete removal or f16_kv, add offload_kqv field Dec 17, 2023

brandonrobertz force-pushed the fix-field-struct branch from 9cdfe93 to 5603782 Compare December 17, 2023 05:19

abetlen merged commit 62944df into abetlen:main Dec 18, 2023

brandonrobertz deleted the fix-field-struct branch December 18, 2023 21:54

brandonrobertz mentioned this pull request Dec 18, 2023

NULL Pointer access when embed_documents #1025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Complete removal or f16_kv, add offload_kqv field #1019

WIP: Complete removal or f16_kv, add offload_kqv field #1019

brandonrobertz commented Dec 17, 2023

ringohoffman commented Dec 18, 2023

DavidMorton commented Dec 18, 2023

brandonrobertz commented Dec 18, 2023

brandonrobertz commented Dec 18, 2023

DavidMorton commented Dec 18, 2023

brandonrobertz commented Dec 18, 2023

DavidMorton commented Dec 18, 2023

abetlen commented Dec 18, 2023

WIP: Complete removal or f16_kv, add offload_kqv field #1019

WIP: Complete removal or f16_kv, add offload_kqv field #1019

Conversation

brandonrobertz commented Dec 17, 2023

ringohoffman commented Dec 18, 2023

DavidMorton commented Dec 18, 2023

brandonrobertz commented Dec 18, 2023

brandonrobertz commented Dec 18, 2023

DavidMorton commented Dec 18, 2023

brandonrobertz commented Dec 18, 2023

DavidMorton commented Dec 18, 2023

abetlen commented Dec 18, 2023