You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the SentencePieceUnigramTokenizer with a custom vocabulary, there is no attribute to handle the unk_id, causing errors when encoding text not present in the vocabulary.
The current implementation does not handle the unk_id, leading to errors when encountering unknown tokens. Adding support for unk_id in the tokenizer initialization would resolve this issue.
Please let me know is there any existing solution to it.
The text was updated successfully, but these errors were encountered:
Description:
When using the
SentencePieceUnigramTokenizer
with a custom vocabulary, there is no attribute to handle theunk_id
, causing errors when encoding text not present in the vocabulary.Example:
{'a': -1.23, 'b': -1.34, 'c': -1.45}
Suggested Fix:
Issue:
The current implementation does not handle the
unk_id
, leading to errors when encountering unknown tokens. Adding support forunk_id
in the tokenizer initialization would resolve this issue.Please let me know is there any existing solution to it.
The text was updated successfully, but these errors were encountered: