Unusable and does not match with token output from GPT-3 #9

nullreferencez · 2022-03-11T12:40:16Z

When a character like “ is used it will give back a faulty output as shown below.

encode('“wrote jack a letter”');

[null,222,250, 42910,14509,257,3850,null,222,251]

Whereas on openai it will give the output as:

[447, 250, 42910, 14509, 257, 3850, 447, 251]

This can be triggered by other characters like █ and many more.

It seems to work fine

syonfox · 2022-12-25T17:48:46Z

Hmm maybe somone else fixed that but it seems to work fine in the latest version see the added test
can probably close this

syonfox · 2022-12-25T17:52:15Z

https://github.com/syonfox/GPT-3-Encoder/actions/runs/3776876895

niieani · 2023-05-24T07:38:53Z

That's because this encoder is actually for the older models. Doesn't match up with gpt-3.5-turbo or gpt-4.
If you want better accuracy for these newer models, see my package that started off as a fork of this one: gpt-tokenizer.

seyfer · 2023-09-25T15:28:27Z

@niieani wdyt about this one? https://github.com/dqbd/tiktoken
is your implementation better?

syonfox · 2023-09-25T20:16:53Z

hmm i cant reproduce build looks promising though

syonfox#6

notable pros no ts :)
i think my build works
simple only one version
good enough estimation

niieani · 2023-09-25T22:31:38Z

@seyfer Tiktoken JS looks good too. My gpt-tokenizer has a few extra features though that might be useful to you (like checking whether a given text is within the token limit or not).

syonfox pushed a commit to syonfox/gptoken that referenced this issue Dec 25, 2022

add a test for issue latitudegames#9

d2b6dca

It seems to work fine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unusable and does not match with token output from GPT-3 #9

Unusable and does not match with token output from GPT-3 #9

nullreferencez commented Mar 11, 2022

syonfox commented Dec 25, 2022

syonfox commented Dec 25, 2022

niieani commented May 24, 2023

seyfer commented Sep 25, 2023

syonfox commented Sep 25, 2023 •

edited

Loading

niieani commented Sep 25, 2023

Unusable and does not match with token output from GPT-3 #9

Unusable and does not match with token output from GPT-3 #9

Comments

nullreferencez commented Mar 11, 2022

syonfox commented Dec 25, 2022

syonfox commented Dec 25, 2022

niieani commented May 24, 2023

seyfer commented Sep 25, 2023

syonfox commented Sep 25, 2023 • edited Loading

niieani commented Sep 25, 2023

syonfox commented Sep 25, 2023 •

edited

Loading