You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Feature request
FastTokenizer support for LLaMa sentencepiece tokenizer.
Motivation
The offset_mapping is only available in FastTokenizer, it would be useful if there's support for this.
Your contribution
I have tried using existing sentencepiece based model as replacement. However hf conversation code means we are missing the byte fallback support
Which means out of vocabulary tokens are simply mapped to instead of using the byte mapping inside the vocab.
The text was updated successfully, but these errors were encountered: