You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all: thank you very much for the continuing work on llama.cpp - I'm using it every day with various models.
For proper context management, however, I often need to know how many tokens prompts and responses contain. There is an "embedding" example, but none for "tokenization".
First of all: thank you very much for the continuing work on llama.cpp - I'm using it every day with various models.
For proper context management, however, I often need to know how many tokens prompts and responses contain. There is an "embedding" example, but none for "tokenization".
This is why I made my own (see my own fork of llama.cpp)
It seems to work, but since I am no C++ programmer and, in addition, not a real AI expert, I hesitate to create a pull request.
Perhaps, somebody else may have a look at it or create a better example for the public...
Thanks for all your effort!
The text was updated successfully, but these errors were encountered: