Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Switch from tiktoken to js-tiktoken for worker compatibility #703

Merged
merged 1 commit into from
Nov 30, 2024

Conversation

antpb
Copy link
Contributor

@antpb antpb commented Nov 30, 2024

Relates to:

Eliza currently uses tiktoken which relies on WASM bindings. This causes build failures in Cloudflare Workers and other edge runtime environments where WASM support is limited or unavailable. This PR migrates the token truncation functionality from tiktoken to js-tiktoken, which is a pure JavaScript implementation of the same tokenization logic. This change removes the WASM dependency while maintaining the same tokenization functionality.

Risks

Seems the collaborator on both packages are the same person so this should be a trusted package. (verify pls)
https://www.npmjs.com/package/js-tiktoken
https://www.npmjs.com/package/tiktoken

Background

What does this PR do?

This PR migrates the token truncation functionality from tiktoken to js-tiktoken, which is a pure JavaScript implementation of the same tokenization logic. This change removes the WASM dependency while maintaining the same tokenization functionality.

What kind of change is this?

Pretty beefy package change and maybe opinionated but id be willing to defend the change if anyone objects. :)

Key changes:

  • Replace tiktoken import with js-tiktoken
  • Remove unnecessary TextDecoder usage since js-tiktoken.decode() returns strings directly
  • Remove encoding.free() call as JavaScript handles memory management
  • Update types to match js-tiktoken interfaces

Why are we doing this? Any context or related work?

Edge compat is gud.

  • Enables deployment to edge runtimes like Cloudflare Workers
  • Maintains same tokenization functionality and accuracy
  • Reduces bundle size by removing WASM dependency
  • Simplifies deployment by removing need for WASM file handling

Documentation changes needed?

Testing

Where should a reviewer start?

Take a look at the included test file for instructions on how to use it.

The automated tests have been done to ensure it:

  • Correctly truncates text to specified token limits
  • Handles edge cases (empty strings, invalid inputs)
  • Maintains proper error handling
  • Works in worker environments

Discord username

antpb

@antpb antpb marked this pull request as ready for review November 30, 2024 05:43
@antpb antpb changed the title Switch from tiktoken to js-tiktoken for worker compatibility fix: Switch from tiktoken to js-tiktoken for worker compatibility Nov 30, 2024
@shakkernerd shakkernerd merged commit 8c35b9e into elizaOS:main Nov 30, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants