Token counting for a whole message #31

Xan-Kun · 2024-05-19T04:25:50Z

Not sure if this would fit into the scope of this project, but could be a real killer feature, since none of the others do it.
If not, please feel free to delete :-)

What would you like to be added:

Be able to pass a whole OpenAI Message object into a function, and get the complete token count back.

Why is this needed:

So far, counting of a complete OpenAI message is quite tricky, as the message can include multiple parts now, functions, tools etc.
As far as I know, there is no C# lib that supports this, doesn't seem like MS is adding any value here (in contrary :-) ) and it seems everyone wants to count tokens for messages, not just text.

Anything else we need to know?

I tried to implement it following this https://stackoverflow.com/a/77175648/4821032
There is also a typescript library that seems to come very close: https://github.com/hmarr/openai-chat-tokens

P.S.: I think it only is really needed for outgoing (prompt) messages, since the incoming chat objects have the actual token count in them.

HavenDV · 2024-05-19T04:32:39Z

I'm not sure if this should be part of this library, but maybe in https://github.com/tryAGI/OpenAI? But this may not be an option if you already heavily depend on another OpenAI sdk.
My idea is to have a client that is completely generated from the OpenAPI specification (with some additional extensions/constructors for convenience) to provide support for new features on the day they are released
For this, I'm putting effort into developing https://github.com/HavenDV/OpenApiGenerator because Kiota/NSwag couldn't handle it, at least when I started it.

Although this sounds quite ambitious, I'm actually making pretty good progress on this. This will also allo to get the same for any other SDK based on the OpenAPI specification, which is very important for the rapid development of a library with a large number of integrations (LangChain .NET)

Xan-Kun · 2024-05-19T04:38:49Z

I see. Since OpenAI doesn't really give us the specs, esp. not in a machine friendly way, that really shouldn't go in here.
IMHO there are a few topics that are heavily unclear how to automate properly: message token counting, price estimation and context window size.
Would be really nice if OpenAI could give us an API endpoint for those (and including it in the OpenAI OpenAPI [sic] spec) :-).

Xan-Kun · 2024-05-19T04:40:46Z

btw, I added your library to this highly viewed SO answer ;-)
https://stackoverflow.com/a/75804651/4821032

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token counting for a whole message #31

Token counting for a whole message #31

Xan-Kun commented May 19, 2024 •

edited

Loading

HavenDV commented May 19, 2024 •

edited

Loading

Xan-Kun commented May 19, 2024

Xan-Kun commented May 19, 2024

Token counting for a whole message #31

Token counting for a whole message #31

Comments

Xan-Kun commented May 19, 2024 • edited Loading

What would you like to be added:

Why is this needed:

Anything else we need to know?

HavenDV commented May 19, 2024 • edited Loading

Xan-Kun commented May 19, 2024

Xan-Kun commented May 19, 2024

Xan-Kun commented May 19, 2024 •

edited

Loading

HavenDV commented May 19, 2024 •

edited

Loading