Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize regexes used in tiktoken #7020

Merged
merged 3 commits into from
Feb 23, 2024
Merged

Optimize regexes used in tiktoken #7020

merged 3 commits into from
Feb 23, 2024

Conversation

stephentoub
Copy link
Member

This ports the tweaks in openai/tiktoken#234. I noticed the differences as they also show up in the source for https://www.youtube.com/watch?v=zduSFxRajkE.

@tarekgh, if this conflicts with any of your changes, feel free to close this and I can re-make them after your changes land.

Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.83%. Comparing base (a139371) to head (849097e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7020      +/-   ##
==========================================
- Coverage   68.83%   68.83%   -0.01%     
==========================================
  Files        1258     1258              
  Lines      250674   250672       -2     
  Branches    25615    25615              
==========================================
- Hits       172561   172543      -18     
- Misses      71484    71495      +11     
- Partials     6629     6634       +5     
Flag Coverage Δ
Debug 68.83% <100.00%> (-0.01%) ⬇️
production 63.27% <100.00%> (-0.01%) ⬇️
test 88.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...rc/Microsoft.ML.Tokenizers/PreTokenizer/Roberta.cs 57.14% <100.00%> (-9.53%) ⬇️
...Microsoft.ML.Tokenizers/PreTokenizer/Whitespace.cs 100.00% <ø> (ø)
src/Microsoft.ML.Tokenizers/Tokenizer.cs 82.64% <100.00%> (ø)

... and 4 files with indirect coverage changes

@tarekgh tarekgh merged commit 4b89d98 into dotnet:main Feb 23, 2024
25 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants