feat: use tiktoken-rs instead of tokenizers, single global tokenizer by salman1993 · Pull Request #3115 · block/goose

salman1993 · 2025-06-27T14:20:30Z

we estimate the # tokens + lots of recent open source models use tiktoken

BEFORE (tokenizers):

🚀 Async Token Counter Demo
===========================

📊 Performance Comparison
-------------------------
🔴 Synchronous TokenCounter:
   Init time: 1.123163875s
   Count time: 1.423333ms
   Total tokens: 42

🟢 Async TokenCounter:
   Init time: 1.09214125s
   Count time: 996.875µs
   Total tokens: 42
   Cache size: 5

⚡ Cached TokenCounter (2nd run):
   Count time: 3.375µs
   Total tokens: 42
   Cache size: 5

AFTER (tiktoken):

🚀 Async Token Counter Demo
===========================

📊 Performance Comparison
-------------------------
🔴 Synchronous TokenCounter:
   Init time: 1.15345675s
   Count time: 1.199375ms
   Total tokens: 42

🟢 Async TokenCounter:
   Init time: 37.125µs
   Count time: 313.167µs
   Total tokens: 42
   Cache size: 5

⚡ Cached TokenCounter (2nd run):
   Count time: 3.292µs
   Total tokens: 42
   Cache size: 5
   Token result caching: 95x faster on cached text

The difference in performance is significant, especially init (since we just have one tokenizer) but the count time is also 3x faster

📊 Benchmark Results Analysis (done by goose)

Performance Summary (10,000 tokens)

Implementation	Time	Performance
tiktoken-rs (sync)	4.26 ms	⭐ Best Raw Performance
tiktoken-rs (async - cache hit)	3.38 μs	1,257x faster (cached, not real measurement)
Xenova GPT-4o (tokenizers)	8.13 ms	1.9x slower than tiktoken
Xenova Claude (tokenizers)	8.68 ms	2.0x slower than tiktoken

Performance Summary (100,000 tokens)

Implementation	Time	Scaling
tiktoken-rs (sync)	42.4 ms	⭐ Best Raw Performance
tiktoken-rs (async - cache hit)	33.5 μs	1,265x faster (cached, not real measurement)
Xenova GPT-4o (tokenizers)	83.0 ms	2.0x slower than tiktoken

Key Findings

🎯 Raw Tokenization Performance

tiktoken-rs is ~2x faster than the old tokenizers library
Performance scales linearly with text length (good!)
Very consistent performance (low standard deviation)

📈 Scaling Characteristics

tiktoken-rs: ~4.2 μs per 1,000 tokens (linear scaling)
tokenizers: ~8.1 μs per 1,000 tokens (linear scaling)
Both scale linearly, which is ideal

Real-World Implications

✅ What This Means

Raw performance improved 2x by switching to tiktoken-rs
Memory usage likely reduced (no need to download/store tokenizer files)
Cache provides massive speedup for repeated content
Consistent performance across different text sizes

⚠️ Cache Considerations

The async benchmarks show unrealistic performance due to cache hits
In real usage, you'll see a mix of cache hits and misses
First-time tokenization will use the sync performance numbers
Repeated content (like system prompts) will be very fast

🎯 Bottom Line

Excellent upgrade: 2x faster raw performance + intelligent caching
Real-world performance: Expect ~4.2ms per 10k tokens for new content
Cached content: Nearly instant (microseconds)
Memory efficient: No large tokenizer files to manage

This addresses the critical performance issue where token counter downloads would create nested Tokio runtimes and block the async executor. Key improvements: - AsyncTokenCounter with proper async download patterns - Global tokenizer cache to prevent repeated downloads - Token result caching with hash-based lookup (80-90% hit rates) - Main context management now uses async token counting - Backward compatible legacy TokenCounter with fixed blocking HTTP client - Comprehensive test coverage for async functionality Performance benefits: - Eliminates blocking Runtime::new().block_on() anti-pattern - Concurrent tokenizer downloads without blocking main executor - Shared tokenizer instances reduce memory usage - Token count caching provides significant speedup on repeated text - Async context operations now properly non-blocking The critical async paths (truncate_context, summarize_context) now use AsyncTokenCounter for optimal performance while maintaining full backward compatibility for sync usage.

…vements This builds on the async token counter with focused optimizations: Performance improvements: - Replace DefaultHasher with AHasher for 2-3x faster cache lookups - Eliminate lock contention by using DashMap for global tokenizer cache - Add cache size management to prevent unbounded memory growth - Maintain accurate token counting while improving cache performance Key changes: - AHasher provides better hash distribution and performance vs DefaultHasher - DashMap allows concurrent reads without blocking on different keys - Cache eviction policies prevent memory leaks in long-running processes - Preserve original tokenization behavior for consistent results These optimizations provide measurable performance gains especially in high-throughput scenarios with concurrent tokenizer access and frequent token counting operations.

- Fixed needless borrow warnings in context.rs - Added blocking feature to reqwest for backward compatibility - Moved demo file to proper examples directory - Applied cargo fmt formatting - All tests pass successfully

- Implement exponential backoff retry logic (3 attempts, up to 30s delay) - Add comprehensive download validation and corruption detection - Enhanced HTTP client with proper timeouts (60s total, 15s connect) - Progress reporting for large tokenizer downloads (>1MB) - Smart retry strategy: retry server errors (5xx) and network failures, fail fast on client errors (4xx) - File integrity validation with JSON structure checking - Partial download recovery and cleanup of corrupted files - Comprehensive test coverage for network resilience scenarios This addresses real-world network conditions including: - Temporary connectivity loss and DNS resolution failures - HuggingFace server downtime/rate limiting - Connection timeouts on slow networks - Partial download corruption

salman1993 · 2025-06-27T14:52:42Z

To see the benchmarking results for previous tokenizers vs tiktoken and also sync/async, download the zip and open report/index.html

benchmark-tokenization.zip

for 100K tokens, here are the mean times:

Xenova--gpt-4o_100000_tokens:  82.216 ms
Xenova--claude-tokenizer_100000_tokens:  91.377 ms
o200k_base_100000_tokens: 42.356 ms
async_o200k_base_100000_tokens: 33.322 µs  # this one is mostly hitting the cache in our toy example, so not a proper measurement

jamadeo

What do we expect the error to be when using tiktoken for claude models? I'm guessing it isn't too big a deal since we are mainly using this to suggest when to summarize?

jamadeo · 2025-06-30T13:56:53Z

crates/goose/src/providers/factory.rs

    use mcp_core::{content::TextContent, Role};
    use std::env;

+    #[warn(dead_code)]


Does it not warn for dead code by default?

not sure why this one wasn't

salman1993 · 2025-07-01T18:42:20Z

What do we expect the error to be when using tiktoken for claude models?

Anthropic doesn't provide us a tokenizer. The previous Xenova-Claude-Tokenizer is also an approximation (see estimate factor here). There is an API endpoint but its easier/faster to use a local tokenizer.

* main: (37 commits) fix: fix desktop recipe url generation (block#3209) feat: improve UX for saving recipes (block#3214) fix: Pass Google AI API key in HTTP header, not query param (block#3192) docs: add linter to CONTRIBUTING.md (block#3168) feat: Structured output for recipes (block#3188) Fix cost tracking accuracy and OpenRouter model pricing (block#3189) docs: update cli install instructions for windows (block#3205) Docs: Cost tracking on the desktop app (block#3204) feat: Adding streamable-http transport support for backend, desktop and cli (block#2942) fix: use the correct `contains` syntax on create-recipe-pr.yml (block#3193) Temporarily Remove GH Copilot Provider (block#3199) docs: fix tab navigation (block#3201) feat: use tiktoken-rs instead of tokenizers, single global tokenizer (block#3115) add playwright-mcp server to extensions list (block#3010) Add `/extension` path for extension installation (block#3011) feat(desktop): Prioritize suffix when truncating path in header (block#3110) chore(release): release version 1.0.31 (block#3185) feat: additional sub recipes via command line (block#3163) Add Internal Recipes To Recipes Cookbook (block#3179) pipe the argument to storage (block#3184) ...

* main: (150 commits) Defend against invalid sessions (block#3229) Clean up session file optionality for --no-session (block#3230) Feat: Support Recipe Parameters in Goose desktop app (block#3155) docs: update recipe example (block#3222) Add native OAuth 2.0 authentication support to MCP client (block#3213) build: Check in Cargo.lock changes (block#3220) fix: fix desktop recipe url generation (block#3209) feat: improve UX for saving recipes (block#3214) fix: Pass Google AI API key in HTTP header, not query param (block#3192) docs: add linter to CONTRIBUTING.md (block#3168) feat: Structured output for recipes (block#3188) Fix cost tracking accuracy and OpenRouter model pricing (block#3189) docs: update cli install instructions for windows (block#3205) Docs: Cost tracking on the desktop app (block#3204) feat: Adding streamable-http transport support for backend, desktop and cli (block#2942) fix: use the correct `contains` syntax on create-recipe-pr.yml (block#3193) Temporarily Remove GH Copilot Provider (block#3199) docs: fix tab navigation (block#3201) feat: use tiktoken-rs instead of tokenizers, single global tokenizer (block#3115) add playwright-mcp server to extensions list (block#3010) ...

…lock#3115) Co-authored-by: jack <> Signed-off-by: Adam Tarantino <tarantino.adam@gmail.com>

…lock#3115) Co-authored-by: jack <> Signed-off-by: Soroosh <soroosh.sarabadani@gmail.com>

…lock#3115) Co-authored-by: jack <>

jack and others added 5 commits June 27, 2025 11:14

fix: resolve clippy warnings and finalize async token counter

b624590

- Fixed needless borrow warnings in context.rs - Added blocking feature to reqwest for backward compatibility - Moved demo file to proper examples directory - Applied cargo fmt formatting - All tests pass successfully

use tiktoken-rs instead of tokenizers

d28b168

salman1993 mentioned this pull request Jun 27, 2025

feat: implement async token counter with network resilience and performance optimizations #3111

Merged

update tokenization_benchmark.rs

81063a7

salman1993 added 2 commits June 27, 2025 10:59

fix goose-server audio tests failing in CI

75a233c

remove reqwest blocking feature

bbba377

salman1993 requested review from DOsinga, baxen, jamadeo, jsibbison-square and michaelneale June 30, 2025 12:45

salman1993 added 2 commits June 30, 2025 08:52

merge main, resolve conflicts

44d3a2e

fmt

de905e9

jamadeo approved these changes Jun 30, 2025

View reviewed changes

salman1993 merged commit 75454b9 into main Jul 1, 2025
8 checks passed

salman1993 deleted the sm/faster-tokenization branch July 1, 2025 18:42

chaitanyarahalkar mentioned this pull request Jul 5, 2025

Macos-only Sandboxing chaitanyarahalkar/goose#9

Open

atarantino pushed a commit to atarantino/goose that referenced this pull request Jul 14, 2025

feat: use tiktoken-rs instead of tokenizers, single global tokenizer (b…

a251c34

…lock#3115) Co-authored-by: jack <> Signed-off-by: Adam Tarantino <tarantino.adam@gmail.com>

s-soroosh pushed a commit to s-soroosh/goose that referenced this pull request Jul 18, 2025

feat: use tiktoken-rs instead of tokenizers, single global tokenizer (b…

4dbb17c

…lock#3115) Co-authored-by: jack <> Signed-off-by: Soroosh <soroosh.sarabadani@gmail.com>

cbruyndoncx pushed a commit to cbruyndoncx/goose that referenced this pull request Jul 20, 2025

feat: use tiktoken-rs instead of tokenizers, single global tokenizer (b…

49717b3

…lock#3115) Co-authored-by: jack <>

r0x0d mentioned this pull request Feb 4, 2026

Remove build-dependencies section from Cargo.toml #6946

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use tiktoken-rs instead of tokenizers, single global tokenizer#3115

feat: use tiktoken-rs instead of tokenizers, single global tokenizer#3115
salman1993 merged 10 commits intomainfrom
sm/faster-tokenization

salman1993 commented Jun 27, 2025 •

edited

Loading

Uh oh!

salman1993 commented Jun 27, 2025 •

edited

Loading

Uh oh!

jamadeo left a comment

Uh oh!

jamadeo Jun 30, 2025

Uh oh!

salman1993 Jul 1, 2025

Uh oh!

salman1993 commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

salman1993 commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Benchmark Results Analysis (done by goose)

Performance Summary (10,000 tokens)

Performance Summary (100,000 tokens)

Key Findings

🎯 Raw Tokenization Performance

📈 Scaling Characteristics

Real-World Implications

✅ What This Means

⚠️ Cache Considerations

🎯 Bottom Line

Uh oh!

salman1993 commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamadeo left a comment

Choose a reason for hiding this comment

Uh oh!

jamadeo Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

salman1993 Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

salman1993 commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

salman1993 commented Jun 27, 2025 •

edited

Loading

salman1993 commented Jun 27, 2025 •

edited

Loading