Releases: huggingface/tokenizers
Releases · huggingface/tokenizers
Rust v0.13.3
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199 - New version 0.13.3 by @Narsil in #1205
- New release by @ArthurZucker in #1207
New Contributors
- @ankane made their first contribution in #1115
- @SeongBeomLEE made their first contribution in #1120
- @hvaara made their first contribution in #1127
- @fzyzcjy made their first contribution in #1137
- @mert-kurttutan made their first contribution in #1150
- @lhoestq made their first contribution in #1192
Full Changelog: v0.13.2...v0.13.3
Python v0.13.3
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199
New Contributors
- @ankane made their first contribution in #1115
- @SeongBeomLEE made their first contribution in #1120
- @hvaara made their first contribution in #1127
- @fzyzcjy made their first contribution in #1137
- @mert-kurttutan made their first contribution in #1150
- @lhoestq made their first contribution in #1192
Full Changelog: node-v0.13.2...python-v0.13.3rc1
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199 - New version 0.13.3 by @Narsil in https://github.com/huggingface/tokenizers/pull/...
Node v0.13.3
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199
New Contributors
- @ankane made their first contribution in #1115
- @SeongBeomLEE made their first contribution in #1120
- @hvaara made their first contribution in #1127
- @fzyzcjy made their first contribution in #1137
- @mert-kurttutan made their first contribution in #1150
- @lhoestq made their first contribution in #1192
Full Changelog: node-v0.13.2...python-v0.13.3rc1
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199 - New version 0.13.3 by @Narsil in https://github.com/huggingface/tokenizers/pull/...
Python v0.13.3rc1
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199
New Contributors
- @ankane made their first contribution in #1115
- @SeongBeomLEE made their first contribution in #1120
- @hvaara made their first contribution in #1127
- @fzyzcjy made their first contribution in #1137
- @mert-kurttutan made their first contribution in #1150
- @lhoestq made their first contribution in #1192
Full Changelog: node-v0.13.2...python-v0.13.3rc1
What's Changed
- Update pr docs actions by @mishig25 in #1101
- Adding rust audit. by @Narsil in #1099
- Revert "Update pr docs actions" by @mishig25 in #1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1108
- Include license file in Rust crate by @ankane in #1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in #1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1120
- Fixing conda ssl location by @Narsil in #1124
- Adding stale bot ? by @Narsil in #1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in #1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in #1127
- Wrap rustdoc html entity in code block by @hvaara in #1130
- Fix broken links in docs by @hvaara in #1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in #1129
- Ignore Cargo.lock for subfolders by @hvaara in #1131
- Fix one char super tiny typo by @fzyzcjy in #1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in #1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in #1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in #1140
- Add missing build targets by @Narsil in #1145
- Adding python 3.8 for M1 by @Narsil in #1147
- Made dirs optional by @ankane in #1148
- Update info on environment variable for threading by @mert-kurttutan in #1150
- Making
Tokenizer
clone. by @Narsil in #1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in #1153 - Improved version. by @Narsil in #1154
- Update model.rs by @thomasw21 in #1166
- Using clippy 1.67 by @Narsil in #1167
- pyo3 v0.18 migration by @mert-kurttutan in #1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in #1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in #1182
- Bump dirs from 3.0 to 4.0 by @hvaara in #1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in #1183 - Faster
datasets
train example by @lhoestq in #1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in #1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in #1194 - Adding 2 new decoders: by @Narsil in #1196
- Fixing decoder strip because of char boundaries. by @Narsil in #1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in #1199 - New version 0.13.3 by @Narsil in https://github.com/huggingface/tokenizers/pull/...
Rust 0.13.2
Python 3.11 support (Python only modification)
Python 0.13.2
[0.13.2]
- [#1096] Python 3.11 support
Node 0.13.2
Python 3.11 support (Python only modification)
Rust 0.13.1
[0.13.1]
- [#1072] Fixing Roberta type ids.
Python v0.13.1
[0.13.1]
- [#1072] Fixing Roberta type ids.
Node 0.13.1
[0.13.1]
- [#1072] Fixing Roberta type ids.