Expand documentation of UnigramTrainer #770

sgugger · 2021-08-04T09:56:53Z

Currently the documentation of the UnigramTrainer does not contain all the options you can set, and the auto-complete in an IDE does not show them either.

This PR fixes that.

Narsil · 2021-08-09T12:07:03Z

@sgugger I think we should update bindings/python/src/trainers.rs instead.

The .pyi files are generated automatically with python stub.py (so we have consistent doc across Rust and Python).

sgugger · 2021-08-09T12:38:30Z

Oh I didn't know that, added the same there.

Narsil · 2021-08-09T12:40:33Z

bindings/python/src/trainers.rs

+///
+///     n_sub_iterations (:obj:`int`):
+///         The number of iterations of the EM algorithm to perform before
+///         pruning the vocabulary.
 #[pyclass(extends=PyTrainer, module = "tokenizers.trainers", name=UnigramTrainer)]
 #[text_signature = "(self, vocab_size=8000, show_progress=True, special_tokens= [])"]


I think this needs editing too. (There should be a test to detect that, let's see if it kicks in correctly.)

Narsil · 2021-08-09T13:29:10Z

@sgugger fyi: https://github.com/huggingface/tokenizers/pull/768/files

sgugger · 2021-08-09T13:31:10Z

I had not seen this PR from @SaulLu. I think this PR is more comprehensive as it adds all the arguments the UnigramTrainer can accept.

Narsil · 2021-08-09T13:53:21Z

#773 (The rust version update)

n1t0

LGTM, Thank you for taking care of this @sgugger!

sgugger requested review from Narsil and n1t0 August 4, 2021 09:56

Narsil reviewed Aug 9, 2021

View reviewed changes

sgugger added 3 commits August 11, 2021 14:44

Expand documentation of UnigramTrainer

7772039

Put doc at the source

9046fd7

Add signature

413b646

Narsil force-pushed the unigram_doc branch from b9c4fc4 to 413b646 Compare August 11, 2021 12:44

make style

ac6e7af

n1t0 approved these changes Aug 12, 2021

View reviewed changes

n1t0 merged commit 6616e69 into master Aug 12, 2021

n1t0 deleted the unigram_doc branch August 12, 2021 14:12

n1t0 mentioned this pull request Aug 12, 2021

Add unk_token argument in UnigramTrainer documentation #768

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand documentation of UnigramTrainer #770

Expand documentation of UnigramTrainer #770

sgugger commented Aug 4, 2021

Narsil commented Aug 9, 2021

sgugger commented Aug 9, 2021

Narsil Aug 9, 2021

Narsil commented Aug 9, 2021

sgugger commented Aug 9, 2021

Narsil commented Aug 9, 2021

n1t0 left a comment

Expand documentation of UnigramTrainer #770

Expand documentation of UnigramTrainer #770

Conversation

sgugger commented Aug 4, 2021

Narsil commented Aug 9, 2021

sgugger commented Aug 9, 2021

Narsil Aug 9, 2021

Choose a reason for hiding this comment

Narsil commented Aug 9, 2021

sgugger commented Aug 9, 2021

Narsil commented Aug 9, 2021

n1t0 left a comment

Choose a reason for hiding this comment