Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Support for Non-OpenAI Models in Token Trimming #1605

Merged
merged 36 commits into from
Jan 3, 2025

Conversation

tcm390
Copy link
Collaborator

@tcm390 tcm390 commented Dec 31, 2024

related:
#1439
#1565

need to be tested: #1439 (comment)

@tcm390 tcm390 marked this pull request as draft December 31, 2024 12:16
@tcm390 tcm390 changed the title trim tokens Support for Non-OpenAI Models in Token Trimming Dec 31, 2024
@monilpat
Copy link
Collaborator

THANK YOU for doing this! It has been wrong since the beginning lol so much appreciated! This is an important fix!

@tcm390 tcm390 marked this pull request as ready for review December 31, 2024 20:44
@monilpat monilpat changed the title Support for Non-OpenAI Models in Token Trimming fix: Support for Non-OpenAI Models in Token Trimming Jan 1, 2025
@tcm390
Copy link
Collaborator Author

tcm390 commented Jan 3, 2025

tried testing with defaults for the new settings got:

Error processing transcribed text: Error: Unknown model
    at getEncodingNameForModel (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:10984:19)
    at encodingForModel (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:11014:24)
    at _TokenizationService.truncateTiktoken (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:11055:26)
    at _TokenizationService.trimTokens (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:11036:29)
    at generateMessageResponse (file:///root/test-prs/pr1605/packages/core/dist/index.js:2556:41)
    at VoiceManager2._generateResponse (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:3178:32)
    at VoiceManager2.handleUserMessage (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:3084:48)
    at async VoiceManager2.processTranscription (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:3029:17)
    at async Timeout._onTimeout (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:2961:17)

with one of my discord character files:
"modelProvider": "anthropic",
this probably needs to work out of the box
marking as a draft so it doesn't get merged

Yeah, since TikToken does not support the Anthropic model, you need to specify the TOKENIZER_MODEL and TOKENIZER_TYPE in env. However, I think maybe I need to fallback to a hardcoded value if people don't specify one or the tokenizer fails. 🤔

Yes, should have default values.

Updated. So now, if people don't specify TOKENIZER_MODEL and TOKENIZER_TYPE, trimTokens will use a hardcoded TikToken tokenizer with gpt-4-mini as the model.

@shakkernerd
Copy link
Member

tried testing with defaults for the new settings got:

Error processing transcribed text: Error: Unknown model
    at getEncodingNameForModel (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:10984:19)
    at encodingForModel (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:11014:24)
    at _TokenizationService.truncateTiktoken (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:11055:26)
    at _TokenizationService.trimTokens (file:///root/test-prs/pr1605/packages/plugin-node/dist/index.js:11036:29)
    at generateMessageResponse (file:///root/test-prs/pr1605/packages/core/dist/index.js:2556:41)
    at VoiceManager2._generateResponse (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:3178:32)
    at VoiceManager2.handleUserMessage (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:3084:48)
    at async VoiceManager2.processTranscription (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:3029:17)
    at async Timeout._onTimeout (file:///root/test-prs/pr1605/packages/client-discord/dist/index.js:2961:17)

with one of my discord character files:
"modelProvider": "anthropic",
this probably needs to work out of the box
marking as a draft so it doesn't get merged

Yeah, since TikToken does not support the Anthropic model, you need to specify the TOKENIZER_MODEL and TOKENIZER_TYPE in env. However, I think maybe I need to fallback to a hardcoded value if people don't specify one or the tokenizer fails. 🤔

Yes, should have default values.

Updated. So now, if people don't specify TOKENIZER_MODEL and TOKENIZER_TYPE, trimTokens will use a hardcoded TikToken tokenizer with gpt-4-mini as the model.

Great!

@tcm390
Copy link
Collaborator Author

tcm390 commented Jan 3, 2025

But I'm not sure if we want to make it a service. If we can resolve the onnxruntime-node issue, maybe we could move it to the core.

monilpat
monilpat previously approved these changes Jan 3, 2025
Copy link
Collaborator

@monilpat monilpat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for creating a thoughtful solution to this. It could use some more thorough testing - given lots of callsites are changed, but fundamentally looks good. It could be moved into core as well and probably should be

We should consider adding another wrapper trimTokensByModelClass(context, modelClass) which under the hood gets the model and the maxInputTokens from it and use that in generateObject etc wherever we are doing it based off of the model defaults and not a configured value.

packages/plugin-node/src/services/tokenizer.ts Outdated Show resolved Hide resolved
packages/plugin-node/src/services/tokenizer.ts Outdated Show resolved Hide resolved
@tcm390
Copy link
Collaborator Author

tcm390 commented Jan 3, 2025

LGTM - thanks for creating a thoughtful solution to this. It could use some more thorough testing - given lots of callsites are changed, but fundamentally looks good. It could be moved into core as well and probably should be

We should consider adding another wrapper trimTokensByModelClass(context, modelClass) which under the hood gets the model and the maxInputTokens from it and use that in generateObject etc wherever we are doing it based off of the model defaults and not a configured value.

updated. moved it to core

@tcm390
Copy link
Collaborator Author

tcm390 commented Jan 3, 2025

Test Scenarios:

  1. Unset Variables: Verified behavior when TOKENIZER_MODEL and TOKENIZER_TYPE are not set.
  2. Incorrect Configuration (Both Invalid): Tested with invalid values for both TOKENIZER_MODEL and TOKENIZER_TYPE.
  3. Partially Incorrect Configuration: Tested with an invalid TOKENIZER_MODEL but a valid TOKENIZER_TYPE.
  4. Correct Configuration: Tested with valid values for both TOKENIZER_MODEL and TOKENIZER_TYPE.

Results:
All scenarios functioned as expected:

  1. Unset Variables: Default tiktoken tokenizer was used with "gpt-4o" for token truncation.
  2. Incorrect Configuration (Both Invalid): Fallback to the default tiktoken tokenizer and "gpt-4o" for token truncation.
  3. Partially Incorrect Configuration: Rough truncation performed using an estimated 4 characters per token.
  4. Correct Configuration: Tokens were truncated successfully with the provided TOKENIZER_MODEL and TOKENIZER_TYPE.

@shakkernerd shakkernerd marked this pull request as ready for review January 3, 2025 14:50
Copy link
Member

@shakkernerd shakkernerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is well done.
Great work!

@shakkernerd shakkernerd merged commit bf6ef96 into develop Jan 3, 2025
7 checks passed
@shakkernerd shakkernerd deleted the tcm-trimTokens branch January 3, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants