-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use binary search in TextChunks (#71)
* Use a binary search to fill TextChunks Co-authored-by: Richard Bradfield <richard.bradfield@platformed.com> * Update snapshots for binary search * Limit next section based on the encoded offsets to limit search space * Use iterator based approach for chunk size to avoid extra allocations * remove unneeded flag in regex matches * Update changelog, bump version, and use higher version of tiktoken * Bump required versions of both tokenizer crates * add back onig feature for tokenizers --------- Co-authored-by: Richard Bradfield <richard.bradfield@platformed.com>
- Loading branch information
Showing
20 changed files
with
19,765 additions
and
17,811 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.