feat: lazy tokenizer loading and defensive asserts #307

ThomasK33 · 2025-10-17T21:48:53Z

Introduce shared assert helper for both main and renderer bundles.
Refactor tokenizer loading to fetch base and encodings on demand.
Add tokenizer readiness events so stores reschedule usage updates.
Ensure cached counts ignore fallback approximations with new test.

ThomasK33: I have yet to review these changes line by line.
Local testing seems fine. Cmux starts in 1.5 seconds instead of
8.8 seconds.
Code could be crap, though.

Introduce shared assert helper for both main and renderer bundles. Refactor tokenizer loading to fetch base and encodings on demand. Add tokenizer readiness events so stores reschedule usage updates. Ensure cached counts ignore fallback approximations with new test.

ammar-agent · 2025-10-17T23:52:18Z

I am 5-Pro

Essentials

Strong win: lazy tokenizer + per-encoding loading, listener replay, and LRU cache discipline. Debounced store notifications should cut render churn.
Invariants via a browser-safe assert helper are a good cross-process simplification.

Must-discuss before merge

Dynamic imports: repo policy says “never use dynamic imports,” but they’re fundamental here. Propose an explicit, documented exception scoped to tokenizer.ts (ESLint override + comment explaining perf rationale) to prevent future “cleanup” regressions.
“Ready” semantics: today “ready == all KNOWN_ENCODINGS loaded,” while UI reacts to per-encoding events. Either (a) redefine ready to “base + at least one encoding,” or (b) keep current meaning but document that UIs should rely on encoding events and treat onTokenizerModulesLoaded as niche.

Quick improvements (low effort, high value)

Add explicit generics to caches/maps (e.g., Map<string, EncodingModule>) and to BASE_MODULE_PROPS.
Symmetric error surface: catch+warn in startLoadingBase like startLoadingEncoding; message that we’ll fall back to approximate counts.
Route new main-process timing logs through log.ts for consistent verbosity control.
Doc a short note that Proxy getters may return undefined transiently while modules load.

Tests to add

Listener replay when added after an encoding loads.
Accurate count overwrites fallback cache.

Packaging sanity check

Verify dynamic-imported encodings are bundled in production and that first-render approximations don’t make tests flaky.

Verdict

+1 to merge after the dynamic-import policy exception is codified and the small typing/docs tweaks above. Clear perf and UX improvement.

— 🤖 Generated with cmux

ammario · 2025-10-18T00:09:13Z

Haven't read the code —

I've been thinking that context statistics could help elevate our users from waiting to engagement with the architecture.

If we could tell you which files cost the most in tokens or which bash commands took the longest we could give you a guiding light on how agent-friendly your codebase is.

So, I think it's worth investing in good design here even if we're not building such crazy features out yet.

ThomasK33 marked this pull request as draft October 17, 2025 21:49

ThomasK33 force-pushed the thomask33/10-17-improve_tokenizer_loading branch from 4fd89cf to 89e1709 Compare October 17, 2025 21:50

ThomasK33 force-pushed the thomask33/10-17-improve_tokenizer_loading branch from 89e1709 to 65cd283 Compare October 17, 2025 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: lazy tokenizer loading and defensive asserts #307

feat: lazy tokenizer loading and defensive asserts #307

ThomasK33 commented Oct 17, 2025 •

edited

Loading

Uh oh!

ammar-agent commented Oct 17, 2025

Uh oh!

ammario commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: lazy tokenizer loading and defensive asserts #307

Are you sure you want to change the base?

feat: lazy tokenizer loading and defensive asserts #307

Conversation

ThomasK33 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ammar-agent commented Oct 17, 2025

Uh oh!

ammario commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ThomasK33 commented Oct 17, 2025 •

edited

Loading