Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand documentation of UnigramTrainer #770

Merged
merged 4 commits into from
Aug 12, 2021
Merged

Expand documentation of UnigramTrainer #770

merged 4 commits into from
Aug 12, 2021

Conversation

sgugger
Copy link
Contributor

@sgugger sgugger commented Aug 4, 2021

Currently the documentation of the UnigramTrainer does not contain all the options you can set, and the auto-complete in an IDE does not show them either.

This PR fixes that.

@sgugger sgugger requested review from Narsil and n1t0 August 4, 2021 09:56
@Narsil
Copy link
Collaborator

Narsil commented Aug 9, 2021

@sgugger I think we should update bindings/python/src/trainers.rs instead.

The .pyi files are generated automatically with python stub.py (so we have consistent doc across Rust and Python).

@sgugger
Copy link
Contributor Author

sgugger commented Aug 9, 2021

Oh I didn't know that, added the same there.

///
/// n_sub_iterations (:obj:`int`):
/// The number of iterations of the EM algorithm to perform before
/// pruning the vocabulary.
#[pyclass(extends=PyTrainer, module = "tokenizers.trainers", name=UnigramTrainer)]
#[text_signature = "(self, vocab_size=8000, show_progress=True, special_tokens= [])"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs editing too. (There should be a test to detect that, let's see if it kicks in correctly.)

@Narsil
Copy link
Collaborator

Narsil commented Aug 9, 2021

@sgugger fyi: https://github.com/huggingface/tokenizers/pull/768/files

@sgugger
Copy link
Contributor Author

sgugger commented Aug 9, 2021

I had not seen this PR from @SaulLu. I think this PR is more comprehensive as it adds all the arguments the UnigramTrainer can accept.

@Narsil
Copy link
Collaborator

Narsil commented Aug 9, 2021

#773 (The rust version update)

Copy link
Member

@n1t0 n1t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you for taking care of this @sgugger!

@n1t0 n1t0 merged commit 6616e69 into master Aug 12, 2021
@n1t0 n1t0 deleted the unigram_doc branch August 12, 2021 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants