-
Notifications
You must be signed in to change notification settings - Fork 770
Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Rust: How to handle models with
precompiled_charsmap = null
#1627
opened Sep 4, 2024 by
kallebysantos
Special token gets tokenized while training tokenizer from scratch
#1624
opened Sep 2, 2024 by
LalchandPandia
ModuleNotFoundError: No module named 'tokenizers.tokenizers'
#1619
opened Aug 25, 2024 by
jpferraro1
Space after unnormalized token is added when
use_fast=True
for Llama tokenizer
#1613
opened Aug 14, 2024 by
Butanium
PreTrainedTokenizerFast Something isn't working
char_to_token
token_to_char
not working as expected
bug
#1620
opened Aug 10, 2024 by
yonigottesman
4 tasks
Support for Golang now or support a cli for other languages?
#1601
opened Aug 7, 2024 by
xuxiaoxia96
[building on windows] onig_sys/oniguruma two or more data types in declaration specifiers
#1581
opened Jul 29, 2024 by
louis030195
Risk of global variable memory leaks when calling train_from_iterator
Stale
#1579
opened Jul 24, 2024 by
Yikai-Liao
Issue with
SentencePieceUnigramTokenizer
Handling Unknown Tokens
#1576
opened Jul 22, 2024 by
Munikumar09
Truncation performs slowly. Tokenizer firstly encodes long sequence and then truncates it.
Feature Request
#1573
opened Jul 19, 2024 by
galtimur
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.