Skip to content

Tokenization Example #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rozek opened this issue Apr 26, 2023 · 2 comments
Closed

Tokenization Example #1193

rozek opened this issue Apr 26, 2023 · 2 comments
Labels

Comments

@rozek
Copy link

rozek commented Apr 26, 2023

First of all: thank you very much for the continuing work on llama.cpp - I'm using it every day with various models.

For proper context management, however, I often need to know how many tokens prompts and responses contain. There is an "embedding" example, but none for "tokenization".

This is why I made my own (see my own fork of llama.cpp)

It seems to work, but since I am no C++ programmer and, in addition, not a real AI expert, I hesitate to create a pull request.

Perhaps, somebody else may have a look at it or create a better example for the public...

Thanks for all your effort!

@SlyEcho
Copy link
Collaborator

SlyEcho commented Apr 27, 2023

I think test-tokenizer-0.cpp is a good example of a minimal tokenizer.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants