Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unboxed tokens #257

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Unboxed tokens #257

wants to merge 10 commits into from

Conversation

kornelski
Copy link
Contributor

This may be easier to review this as individual commits, since big diffs are from splitting and renaming files.

This refactoring removes TokenCapturer and TokenCapturerEvent, which enables Lexeme::to_token to be inlined. This way the Token isn't moved around as much, and it's not necessary to use Box<Token>.

This allowed TextDecoder to be simplified, and handle just the decoding, similar to TextEncoder. I've added a fast path for handling text chunks, with zero-copy for UTF-8 and ASCII. It could have been even faster, if it didn't keep the existing tricky behavior: #255.

Overall, it makes most benchmarks 5-8% faster, with 25-40% faster on text-heavy documents.

@kornelski kornelski force-pushed the unboxed-tokens branch 3 times, most recently from f81165f to ec78225 Compare January 6, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant