Ban some tokens #474

AnnaKholkina · 2023-11-22T09:06:46Z

Hello. I would like the model to not use some tokens (such as \n). When training a model, can I remove unnecessary tokens from the tokenizer and how to do this? And how will the removal of most tokens affect the quality of training? (let's say I want to train a model to speak only one language).

Thanks for your answers!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ban some tokens #474

Ban some tokens #474

AnnaKholkina commented Nov 22, 2023 •

edited

Loading

Ban some tokens #474

Ban some tokens #474

Comments

AnnaKholkina commented Nov 22, 2023 • edited Loading

AnnaKholkina commented Nov 22, 2023 •

edited

Loading