Releases: Hk669/bpetokenizer
Releases · Hk669/bpetokenizer
v1.2.1
What's Changed
- feat: starttime-endtime added with the throughput on verbose by @Hk669 in #10
- Updates for the pretrained tokenizers. by @Hk669 in #11
Full Changelog: v1.2.0...v1.2.1
v1.0.32
Full Changelog: v1.0.31...v1.0.32
- added hyperparameter
min_frequency
to adjust the merge pairs to avoid extra vocab.
- default is set to 2.
- made some changes in the tests.
v1.0.31
Full Changelog: v1.0.3...v1.0.31
- added a tokens visibilty feature to the developers to view their splitting of the tokens and as well as the text chunks split using the pattern.
- added more samples
v1.0.3
added the mode
parameter in the save
and load
methods to help developers, save and load their vocab and the merges of the tokenizer in their desired format .
Full Changelog: v1.0.21...v1.0.3
v1.0.2
build working correctly, ensuring the upload to pypi working.
v1.0.10
testing the pypi package auto upload
v1.0.1
first release
adds the following functionalities:
- BPETokenizer: which can be used to build your tokenizer for the LLM
- Tokenizer: a base class which leverages the save and load of the vocab and merges