Skip to content

Conversation

ThomasK33
Copy link
Member

@ThomasK33 ThomasK33 commented Oct 17, 2025

Introduce shared assert helper for both main and renderer bundles.
Refactor tokenizer loading to fetch base and encodings on demand.
Add tokenizer readiness events so stores reschedule usage updates.
Ensure cached counts ignore fallback approximations with new test.


ThomasK33: I have yet to review these changes line by line.
Local testing seems fine. Cmux starts in 1.5 seconds instead of
8.8 seconds.
Code could be crap, though.

@ThomasK33 ThomasK33 marked this pull request as draft October 17, 2025 21:49
@ThomasK33 ThomasK33 force-pushed the thomask33/10-17-improve_tokenizer_loading branch from 4fd89cf to 89e1709 Compare October 17, 2025 21:50
Introduce shared assert helper for both main and renderer bundles.
Refactor tokenizer loading to fetch base and encodings on demand.
Add tokenizer readiness events so stores reschedule usage updates.
Ensure cached counts ignore fallback approximations with new test.
@ThomasK33 ThomasK33 force-pushed the thomask33/10-17-improve_tokenizer_loading branch from 89e1709 to 65cd283 Compare October 17, 2025 22:01
@ammar-agent
Copy link
Collaborator

I am 5-Pro

Essentials

  • Strong win: lazy tokenizer + per-encoding loading, listener replay, and LRU cache discipline. Debounced store notifications should cut render churn.
  • Invariants via a browser-safe assert helper are a good cross-process simplification.

Must-discuss before merge

  • Dynamic imports: repo policy says “never use dynamic imports,” but they’re fundamental here. Propose an explicit, documented exception scoped to tokenizer.ts (ESLint override + comment explaining perf rationale) to prevent future “cleanup” regressions.
  • “Ready” semantics: today “ready == all KNOWN_ENCODINGS loaded,” while UI reacts to per-encoding events. Either (a) redefine ready to “base + at least one encoding,” or (b) keep current meaning but document that UIs should rely on encoding events and treat onTokenizerModulesLoaded as niche.

Quick improvements (low effort, high value)

  • Add explicit generics to caches/maps (e.g., Map<string, EncodingModule>) and to BASE_MODULE_PROPS.
  • Symmetric error surface: catch+warn in startLoadingBase like startLoadingEncoding; message that we’ll fall back to approximate counts.
  • Route new main-process timing logs through log.ts for consistent verbosity control.
  • Doc a short note that Proxy getters may return undefined transiently while modules load.

Tests to add

  • Listener replay when added after an encoding loads.
  • Accurate count overwrites fallback cache.

Packaging sanity check

  • Verify dynamic-imported encodings are bundled in production and that first-render approximations don’t make tests flaky.

Verdict

  • +1 to merge after the dynamic-import policy exception is codified and the small typing/docs tweaks above. Clear perf and UX improvement.

— 🤖 Generated with cmux

@ammario
Copy link
Member

ammario commented Oct 18, 2025

Haven't read the code —

I've been thinking that context statistics could help elevate our users from waiting to engagement with the architecture.

If we could tell you which files cost the most in tokens or which bash commands took the longest we could give you a guiding light on how agent-friendly your codebase is.

So, I think it's worth investing in good design here even if we're not building such crazy features out yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants