-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusable and does not match with token output from GPT-3 #9
Comments
It seems to work fine
Hmm maybe somone else fixed that but it seems to work fine in the latest version see the added test |
That's because this encoder is actually for the older models. Doesn't match up with gpt-3.5-turbo or gpt-4. |
@niieani wdyt about this one? https://github.com/dqbd/tiktoken |
hmm i cant reproduce build looks promising though notable pros no ts :) |
@seyfer Tiktoken JS looks good too. My gpt-tokenizer has a few extra features though that might be useful to you (like checking whether a given text is within the token limit or not). |
When a character like “ is used it will give back a faulty output as shown below.
encode('“wrote jack a letter”');
[null,222,250, 42910,14509,257,3850,null,222,251]
Whereas on openai it will give the output as:
[447, 250, 42910, 14509, 257, 3850, 447, 251]
This can be triggered by other characters like █ and many more.
The text was updated successfully, but these errors were encountered: