-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand documentation of UnigramTrainer #770
Conversation
@sgugger I think we should update The |
Oh I didn't know that, added the same there. |
bindings/python/src/trainers.rs
Outdated
/// | ||
/// n_sub_iterations (:obj:`int`): | ||
/// The number of iterations of the EM algorithm to perform before | ||
/// pruning the vocabulary. | ||
#[pyclass(extends=PyTrainer, module = "tokenizers.trainers", name=UnigramTrainer)] | ||
#[text_signature = "(self, vocab_size=8000, show_progress=True, special_tokens= [])"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs editing too. (There should be a test to detect that, let's see if it kicks in correctly.)
I had not seen this PR from @SaulLu. I think this PR is more comprehensive as it adds all the arguments the |
#773 (The rust version update) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you for taking care of this @sgugger!
Currently the documentation of the UnigramTrainer does not contain all the options you can set, and the auto-complete in an IDE does not show them either.
This PR fixes that.