Switch embed to llama_get_embeddings_seq #1263

iamlemec · 2024-03-08T18:40:23Z

Due to updates in ggml-org/llama.cpp#5796, sequence level embeddings are now output through a separate channel from token level embeddings, and they are accessed with llama_get_embeddings_seq.

abetlen · 2024-03-09T01:39:11Z

@iamlemec thank you! Just so I understand, the sequence level embeddings are the ones that are pooled up to the end of the from the last processed batch?

Also, I think the new function in llama_cpp.py is duplicated by accident.

llama_cpp/llama_cpp.py

iamlemec · 2024-03-09T01:46:13Z

Oh yeah, I didn't see you added it already! Yup, it's the pooled embeddings by sequence for the last batch. So for both mean pooling and cls (first token) it works, and for non-pooling it returns null.

Co-authored-by: Andrei <abetlen@gmail.com>

switch to llama_get_embeddings_seq

4073532

abetlen reviewed Mar 9, 2024

View reviewed changes

llama_cpp/llama_cpp.py Outdated Show resolved Hide resolved

Remove duplicate definition of llama_get_embeddings_seq

3a23829

Co-authored-by: Andrei <abetlen@gmail.com>

abetlen merged commit 2811014 into abetlen:main Mar 9, 2024
16 checks passed

abetlen mentioned this pull request Mar 14, 2024

[regression] embeddings working on 0.2.55 but not on 0.2.56 #1269

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch embed to llama_get_embeddings_seq #1263

Switch embed to llama_get_embeddings_seq #1263

iamlemec commented Mar 8, 2024

abetlen commented Mar 9, 2024

iamlemec commented Mar 9, 2024

Switch embed to llama_get_embeddings_seq #1263

Switch embed to llama_get_embeddings_seq #1263

Conversation

iamlemec commented Mar 8, 2024

abetlen commented Mar 9, 2024

iamlemec commented Mar 9, 2024