All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- [#...]: Fix a bug with...
- [#...]:
- [#...]:
- Fixed data race at
PostProcess
andEncodeBatch
- Error handling when
Tokenizer Model
is nil.
- Clean-up unwanted console print-out at
processor/bert
- Added pretrained model "Roberta" and "GPT2" and "BertLargeCasedWholeWordMaskingSquad"
- #14: fixed truncation and padding not working properly
- Update "example/truncation", "example/pretrained"
- #13: fixed Wordpiece Decoder not join incorrectly tokens and not strip prefix.
- #13: fixed Wordpiece Decoder not join incorrectly tokens and not strip prefix.
- #12: fixed using pointer to Decoder interface in Tokenizer struct.
- Updated
example_test
andexample
in README
- #11: added
addSpecialTokens
param toEncodeSingle
EncodePair
andTokenizer
APIs.
- Update Changelog and README
- #10: setup pretrained subpackage to load pretrained tokenizers.
- #8: Fixed
encoding.MergeWith
merge overflowing incorrectly.