feat: use tiktoken-rs instead of tokenizers, single global tokenizer#3115
feat: use tiktoken-rs instead of tokenizers, single global tokenizer#3115salman1993 merged 10 commits intomainfrom
Conversation
This addresses the critical performance issue where token counter downloads would create nested Tokio runtimes and block the async executor. Key improvements: - AsyncTokenCounter with proper async download patterns - Global tokenizer cache to prevent repeated downloads - Token result caching with hash-based lookup (80-90% hit rates) - Main context management now uses async token counting - Backward compatible legacy TokenCounter with fixed blocking HTTP client - Comprehensive test coverage for async functionality Performance benefits: - Eliminates blocking Runtime::new().block_on() anti-pattern - Concurrent tokenizer downloads without blocking main executor - Shared tokenizer instances reduce memory usage - Token count caching provides significant speedup on repeated text - Async context operations now properly non-blocking The critical async paths (truncate_context, summarize_context) now use AsyncTokenCounter for optimal performance while maintaining full backward compatibility for sync usage.
…vements This builds on the async token counter with focused optimizations: Performance improvements: - Replace DefaultHasher with AHasher for 2-3x faster cache lookups - Eliminate lock contention by using DashMap for global tokenizer cache - Add cache size management to prevent unbounded memory growth - Maintain accurate token counting while improving cache performance Key changes: - AHasher provides better hash distribution and performance vs DefaultHasher - DashMap allows concurrent reads without blocking on different keys - Cache eviction policies prevent memory leaks in long-running processes - Preserve original tokenization behavior for consistent results These optimizations provide measurable performance gains especially in high-throughput scenarios with concurrent tokenizer access and frequent token counting operations.
- Fixed needless borrow warnings in context.rs - Added blocking feature to reqwest for backward compatibility - Moved demo file to proper examples directory - Applied cargo fmt formatting - All tests pass successfully
- Implement exponential backoff retry logic (3 attempts, up to 30s delay) - Add comprehensive download validation and corruption detection - Enhanced HTTP client with proper timeouts (60s total, 15s connect) - Progress reporting for large tokenizer downloads (>1MB) - Smart retry strategy: retry server errors (5xx) and network failures, fail fast on client errors (4xx) - File integrity validation with JSON structure checking - Partial download recovery and cleanup of corrupted files - Comprehensive test coverage for network resilience scenarios This addresses real-world network conditions including: - Temporary connectivity loss and DNS resolution failures - HuggingFace server downtime/rate limiting - Connection timeouts on slow networks - Partial download corruption
|
To see the benchmarking results for previous tokenizers vs tiktoken and also sync/async, download the zip and open for 100K tokens, here are the mean times: |
jamadeo
left a comment
There was a problem hiding this comment.
What do we expect the error to be when using tiktoken for claude models? I'm guessing it isn't too big a deal since we are mainly using this to suggest when to summarize?
| use mcp_core::{content::TextContent, Role}; | ||
| use std::env; | ||
|
|
||
| #[warn(dead_code)] |
There was a problem hiding this comment.
Does it not warn for dead code by default?
There was a problem hiding this comment.
not sure why this one wasn't
Anthropic doesn't provide us a tokenizer. The previous |
* main: (37 commits) fix: fix desktop recipe url generation (block#3209) feat: improve UX for saving recipes (block#3214) fix: Pass Google AI API key in HTTP header, not query param (block#3192) docs: add linter to CONTRIBUTING.md (block#3168) feat: Structured output for recipes (block#3188) Fix cost tracking accuracy and OpenRouter model pricing (block#3189) docs: update cli install instructions for windows (block#3205) Docs: Cost tracking on the desktop app (block#3204) feat: Adding streamable-http transport support for backend, desktop and cli (block#2942) fix: use the correct `contains` syntax on create-recipe-pr.yml (block#3193) Temporarily Remove GH Copilot Provider (block#3199) docs: fix tab navigation (block#3201) feat: use tiktoken-rs instead of tokenizers, single global tokenizer (block#3115) add playwright-mcp server to extensions list (block#3010) Add `/extension` path for extension installation (block#3011) feat(desktop): Prioritize suffix when truncating path in header (block#3110) chore(release): release version 1.0.31 (block#3185) feat: additional sub recipes via command line (block#3163) Add Internal Recipes To Recipes Cookbook (block#3179) pipe the argument to storage (block#3184) ...
* main: (150 commits) Defend against invalid sessions (block#3229) Clean up session file optionality for --no-session (block#3230) Feat: Support Recipe Parameters in Goose desktop app (block#3155) docs: update recipe example (block#3222) Add native OAuth 2.0 authentication support to MCP client (block#3213) build: Check in Cargo.lock changes (block#3220) fix: fix desktop recipe url generation (block#3209) feat: improve UX for saving recipes (block#3214) fix: Pass Google AI API key in HTTP header, not query param (block#3192) docs: add linter to CONTRIBUTING.md (block#3168) feat: Structured output for recipes (block#3188) Fix cost tracking accuracy and OpenRouter model pricing (block#3189) docs: update cli install instructions for windows (block#3205) Docs: Cost tracking on the desktop app (block#3204) feat: Adding streamable-http transport support for backend, desktop and cli (block#2942) fix: use the correct `contains` syntax on create-recipe-pr.yml (block#3193) Temporarily Remove GH Copilot Provider (block#3199) docs: fix tab navigation (block#3201) feat: use tiktoken-rs instead of tokenizers, single global tokenizer (block#3115) add playwright-mcp server to extensions list (block#3010) ...
…lock#3115) Co-authored-by: jack <> Signed-off-by: Adam Tarantino <tarantino.adam@gmail.com>
…lock#3115) Co-authored-by: jack <> Signed-off-by: Soroosh <soroosh.sarabadani@gmail.com>
…lock#3115) Co-authored-by: jack <>
we estimate the # tokens + lots of recent open source models use tiktoken
BEFORE (tokenizers):
AFTER (tiktoken):
The difference in performance is significant, especially init (since we just have one tokenizer) but the count time is also 3x faster
📊 Benchmark Results Analysis (done by goose)
Performance Summary (10,000 tokens)
Performance Summary (100,000 tokens)
Key Findings
🎯 Raw Tokenization Performance
📈 Scaling Characteristics
Real-World Implications
✅ What This Means
🎯 Bottom Line