Improve token estimation accuracy for multibyte text

**Parent**: #740 (P0)

## Problem
Current `estimate_tokens()` uses `bytes/3` heuristic which overestimates on multibyte text (Cyrillic, CJK — up to 2-3x). This causes premature context compaction and inaccurate budget allocation.

## Solution
- Replace `text.len() / 3` with `text.chars().count() / 4`
- Add configurable safety margin (default 1.0, recommended 1.2 for production)
- Optionally support `tiktoken-rs` behind a feature flag for precise counting with cloud providers

## Affected crates
- `zeph-memory` (`estimate_tokens` function)
- `zeph-core` (all call sites)

## Acceptance criteria
- [ ] Estimation accuracy within 20% on mixed ASCII/Cyrillic/CJK text
- [ ] Safety margin configurable via `memory.token_safety_margin`
- [ ] Existing tests updated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve token estimation accuracy for multibyte text #742

Problem

Solution

Affected crates

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve token estimation accuracy for multibyte text #742

Description

Problem

Solution

Affected crates

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions