Make docstring match args #4711

sgugger · 2020-06-01T19:18:02Z

When replying to #4698, I realized some language model docstrings are using arguments that are not present in the function signature. This PR addresses that (for all the ones I found at least).

The alternative would be to change the argument names in the function signatures (if it makes the various model APIs more consistent).

codecov-commenter · 2020-06-01T19:22:28Z

Codecov Report

Merging #4711 into master will decrease coverage by 0.18%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4711      +/-   ##
==========================================
- Coverage   77.32%   77.14%   -0.19%     
==========================================
  Files         128      128              
  Lines       21071    21071              
==========================================
- Hits        16294    16256      -38     
- Misses       4777     4815      +38

Impacted Files	Coverage Δ
src/transformers/modeling_bart.py	`96.79% <ø> (ø)`
src/transformers/modeling_gpt2.py	`72.11% <ø> (-14.11%)`	⬇️
src/transformers/modeling_openai.py	`80.41% <ø> (-1.38%)`	⬇️
src/transformers/modeling_transfo_xl.py	`77.00% <ø> (ø)`
src/transformers/modeling_xlm.py	`89.46% <ø> (ø)`
src/transformers/modeling_utils.py	`89.94% <0.00%> (-0.24%)`	⬇️
src/transformers/file_utils.py	`73.85% <0.00%> (+0.41%)`	⬆️
src/transformers/modeling_tf_utils.py	`88.66% <0.00%> (+1.80%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6449c49...e7d6cb9. Read the comment docs.

LysandreJik

Nice, thanks @sgugger ! At some point we would like to have labels on every model, rather than labels on some, masked_lm_labels on others and lm_labels on others. We would need to deprecate the current lm_labels and masked_lm_labels while keeping them in the signature to keep backwards compatibility.

For reference: #4055, #4198 (comment) (cc @thomwolf, @patrickvonplaten, @julien-c)

sgugger · 2020-06-01T19:26:48Z

I can work on this if there is no one on it. Quick question though: what about the models that have both lm_labels and masked_lm_labels? encode_decoder is one of them for instance, don't know if there are more.

LysandreJik · 2020-06-01T19:32:47Z

Yes, that's the case for BertForMaskedLM for example. I don't really know the best way to handle this.

As with this update we're trying to have the exact same API for all models so that the training/inference code is model agnostic, I'd say that we should look for the most natural on a case-by-case basis.

For example with the BertForMaskedLM example, I believe the labels should be the masked_lm_labels, as BERT should be used for MLM rather than CLM.

patrickvonplaten · 2020-06-02T07:44:23Z

the

As far as I know, BertForMaskedLM does not really use lm_labels at the moment. I think it was added to support a causal Bert in an encoder-decoder setting so that the decoder can be trained with a causal mask with the language model objective. Since the encoder-decoder framework is not really released yet, I think we can also add a new BertWithLMHead class so that each class only has one labels argument. It would be a breaking change in terms of the class name for people that already implemented Bert2Bert models, but I think it's worth it for consistency. What do you think?

@sgugger - In the encoder-decoder model I added both lm_labels and masked_lm_labels because Bert has both lm_labels and masked_lm_labels. Normally, encoder-decoder models are trained with a CLM objective so not sure if we even need masked_lm_lables for the encoder-decoder model wrapper.

thomwolf · 2020-06-02T08:12:14Z

@patrickvonplaten good for me

Make docstring match args

e7d6cb9

LysandreJik approved these changes Jun 1, 2020

View reviewed changes

LysandreJik merged commit 7677936 into huggingface:master Jun 1, 2020

sgugger deleted the doc_typos branch June 1, 2020 19:26

sgugger mentioned this pull request Jun 1, 2020

GPT2LMHeadModel Documentation Mismatch for labels #4047

Closed

TevenLeScao mentioned this pull request Jun 1, 2020

Transformer-XL: Input and labels for Language Modeling #4698

Closed

sgugger mentioned this pull request Jun 2, 2020

Unify label args #4722

Merged

sgugger mentioned this pull request Jun 9, 2020

Split LMBert model in two #4874

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make docstring match args #4711

Make docstring match args #4711

sgugger commented Jun 1, 2020

codecov-commenter commented Jun 1, 2020

LysandreJik left a comment

sgugger commented Jun 1, 2020

LysandreJik commented Jun 1, 2020

patrickvonplaten commented Jun 2, 2020 •

edited

Loading

thomwolf commented Jun 2, 2020

Make docstring match args #4711

Make docstring match args #4711

Conversation

sgugger commented Jun 1, 2020

codecov-commenter commented Jun 1, 2020

Codecov Report

LysandreJik left a comment

Choose a reason for hiding this comment

sgugger commented Jun 1, 2020

LysandreJik commented Jun 1, 2020

patrickvonplaten commented Jun 2, 2020 • edited Loading

thomwolf commented Jun 2, 2020

patrickvonplaten commented Jun 2, 2020 •

edited

Loading