Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we have an example of pure C ++? #112

Open
cgisky1980 opened this issue Jul 4, 2023 · 4 comments
Open

Can we have an example of pure C ++? #112

cgisky1980 opened this issue Jul 4, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@cgisky1980
Copy link

cgisky1980 commented Jul 4, 2023

Can we have an example of pure C ++ ?
python & pytorch is too big for the end users.

@cgisky1980 cgisky1980 changed the title Can we have an example of pure C ++ Examples? Can we have an example of pure C ++? Jul 4, 2023
@saharNooby
Copy link
Collaborator

python & pytorch is too big for the end users.

I agree, but pure C++ implementation is blocked by implementing tokenizer, specifically -- Unicode normalization and Unicode regexes in 20B_tokenizer. Although RWKV World models are easier to support, since their tokenizer does not require Unicode libraries.

I myself have no plans of implementing the tokenizer in C, but I welcome PRs.

@saharNooby saharNooby added the enhancement New feature or request label Jul 5, 2023
@cgisky1980
Copy link
Author

https://github.com/koute/rwkv_tokenizer rust

@cgisky1980
Copy link
Author

python & pytorch is too big for the end users.

I agree, but pure C++ implementation is blocked by implementing tokenizer, specifically -- Unicode normalization and Unicode regexes in 20B_tokenizer. Although RWKV World models are easier to support, since their tokenizer does not require Unicode libraries.

I myself have no plans of implementing the tokenizer in C, but I welcome PRs.

https://github.com/mlc-ai/tokenizers-cpp

@saharNooby
Copy link
Collaborator

@cgisky1980 Unfortunately, tokenizers-cpp will not work here:

You also need to turn on c++17 support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants