Skip to content

Commit

Permalink
Switch return_dict to True by default. (huggingface#8530)
Browse files Browse the repository at this point in the history
* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests
  • Loading branch information
sgugger authored and Zhylkaaa committed Nov 17, 2020
1 parent d63af9c commit 198cb8c
Show file tree
Hide file tree
Showing 106 changed files with 138 additions and 234 deletions.
2 changes: 1 addition & 1 deletion docs/source/model_doc/bertgeneration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Usage:
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
# train...
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels, return_dict=True).loss
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
loss.backward()
Expand Down
4 changes: 2 additions & 2 deletions docs/source/model_doc/t5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ token. T5 can be trained / fine-tuned both in a supervised and unsupervised fash
input_ids = tokenizer('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt').input_ids
labels = tokenizer('<extra_id_0> cute dog <extra_id_1> the <extra_id_2>', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
loss = model(input_ids=input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids=input_ids, labels=labels).loss
- Supervised training

Expand All @@ -77,7 +77,7 @@ token. T5 can be trained / fine-tuned both in a supervised and unsupervised fash
input_ids = tokenizer('translate English to German: The house is wonderful.', return_tensors='pt').input_ids
labels = tokenizer('Das Haus ist wunderbar.', return_tensors='pt').input_ids
# the forward function automatically creates the correct decoder_input_ids
loss = model(input_ids=input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids=input_ids, labels=labels).loss
T5Config
Expand Down
32 changes: 16 additions & 16 deletions docs/source/task_summary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ each other. The process is the following:
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=True)
>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> classes = ["not paraphrase", "is paraphrase"]
Expand Down Expand Up @@ -122,7 +122,7 @@ each other. The process is the following:
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=True)
>>> model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> classes = ["not paraphrase", "is paraphrase"]
Expand Down Expand Up @@ -211,7 +211,7 @@ Here is an example of question answering using a model and a tokenizer. The proc
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad", return_dict=True)
>>> model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> text = r"""
... 🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
Expand Down Expand Up @@ -253,7 +253,7 @@ Here is an example of question answering using a model and a tokenizer. The proc
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad", return_dict=True)
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> text = r"""
... 🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
Expand Down Expand Up @@ -373,7 +373,7 @@ Here is an example of doing masked language modeling using a model and a tokeniz
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
>>> model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased")
>>> sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
Expand All @@ -389,7 +389,7 @@ Here is an example of doing masked language modeling using a model and a tokeniz
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
>>> model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased")
>>> sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
Expand Down Expand Up @@ -437,7 +437,7 @@ of tokens.
>>> from torch.nn import functional as F
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelWithLMHead.from_pretrained("gpt2", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("gpt2")
>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "
Expand All @@ -461,7 +461,7 @@ of tokens.
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = TFAutoModelWithLMHead.from_pretrained("gpt2", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("gpt2")
>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "
Expand Down Expand Up @@ -520,7 +520,7 @@ Here is an example of text generation using ``XLNet`` and its tokenizer.
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased")
>>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
>>> # Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
Expand All @@ -545,7 +545,7 @@ Here is an example of text generation using ``XLNet`` and its tokenizer.
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased")
>>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
>>> # Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
Expand Down Expand Up @@ -664,7 +664,7 @@ Here is an example of doing named entity recognition, using a model and a tokeni
>>> from transformers import AutoModelForTokenClassification, AutoTokenizer
>>> import torch
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english", return_dict=True)
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
>>> label_list = [
Expand Down Expand Up @@ -692,7 +692,7 @@ Here is an example of doing named entity recognition, using a model and a tokeni
>>> from transformers import TFAutoModelForTokenClassification, AutoTokenizer
>>> import tensorflow as tf
>>> model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english", return_dict=True)
>>> model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
>>> label_list = [
Expand Down Expand Up @@ -790,7 +790,7 @@ CNN / Daily Mail), it yields very good results.
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
Expand All @@ -799,7 +799,7 @@ CNN / Daily Mail), it yields very good results.
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
Expand Down Expand Up @@ -843,15 +843,15 @@ Here is an example of doing translation using a model and a tokenizer. The proce
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="pt")
>>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base", return_dict=True)
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="tf")
Expand Down
2 changes: 1 addition & 1 deletion docs/source/training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ head on top of the encoder with an output size of 2. Models are initialized in `
.. code-block:: python
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model.train()
This is useful because it allows us to make use of the pre-trained BERT encoder and easily train it on whatever
Expand Down
4 changes: 1 addition & 3 deletions examples/lxmert/demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,6 @@
" visual_feats=features,\n",
" visual_pos=normalized_boxes,\n",
" token_type_ids=inputs.token_type_ids,\n",
" return_dict=True,\n",
" output_attentions=False,\n",
" )\n",
" output_vqa = lxmert_vqa(\n",
Expand All @@ -219,7 +218,6 @@
" visual_feats=features,\n",
" visual_pos=normalized_boxes,\n",
" token_type_ids=inputs.token_type_ids,\n",
" return_dict=True,\n",
" output_attentions=False,\n",
" )\n",
" # get prediction\n",
Expand Down Expand Up @@ -266,4 +264,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
2 changes: 1 addition & 1 deletion examples/question-answering/run_squad.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ def evaluate(args, model, tokenizer, prefix=""):
eval_feature = features[feature_index.item()]
unique_id = int(eval_feature.unique_id)

output = [to_list(output[i]) for output in outputs]
output = [to_list(output[i]) for output in outputs.to_tuple()]

# Some models (XLNet, XLM) use 5 arguments for their predictions, while the other "simpler"
# models only use two.
Expand Down
2 changes: 1 addition & 1 deletion examples/rag/eval_rag.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def strip_title(title):
truncation=True,
)["input_ids"].to(args.device)

question_enc_outputs = rag_model.rag.question_encoder(retriever_input_ids, return_dict=True)
question_enc_outputs = rag_model.rag.question_encoder(retriever_input_ids)
question_enc_pool_output = question_enc_outputs.pooler_output

result = rag_model.retriever(
Expand Down
1 change: 0 additions & 1 deletion examples/rag/finetune.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,6 @@ def _step(self, batch: dict) -> Tuple:
decoder_input_ids=decoder_input_ids,
use_cache=False,
labels=lm_labels,
return_dict=True,
**rag_kwargs,
)

Expand Down
2 changes: 1 addition & 1 deletion examples/rag/use_own_knowledge_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def embed(documents: dict, ctx_encoder: DPRContextEncoder, ctx_tokenizer: DPRCon
input_ids = ctx_tokenizer(
documents["title"], documents["text"], truncation=True, padding="longest", return_tensors="pt"
)["input_ids"]
embeddings = ctx_encoder(input_ids.to(device=device), return_dict=True).pooler_output
embeddings = ctx_encoder(input_ids.to(device=device)).pooler_output
return {"embeddings": embeddings.detach().cpu().numpy()}


Expand Down
3 changes: 0 additions & 3 deletions examples/seq2seq/distillation.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,6 @@ def _step(self, batch: dict) -> tuple:
output_hidden_states=self.do_calc_hidden_loss,
output_attentions=False,
use_cache=False,
return_dict=True,
)
lm_logits = student_outputs.logits

Expand All @@ -179,7 +178,6 @@ def zero_tensor():
input_ids,
attention_mask=src_mask,
output_hidden_states=self.do_calc_hidden_loss,
return_dict=True,
)
if self.different_base_models:
teacher_enc_outputs = all_teacher_encoder_outputs.last_hidden_state
Expand All @@ -199,7 +197,6 @@ def zero_tensor():
decoder_input_ids=decoder_input_ids,
output_hidden_states=self.do_calc_hidden_loss,
use_cache=False, # since we are not passing labels, never let this default to True
return_dict=True,
)
dec_mask = decoder_input_ids.ne(pad_token_id)
loss_ce = self.calc_ce_loss(dec_mask, lm_logits, teacher_outputs.logits)
Expand Down
2 changes: 1 addition & 1 deletion examples/seq2seq/test_seq2seq_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def test_distill_checkpointing_with_teacher(self):

@require_torch_non_multi_gpu_but_fix_me
def test_loss_fn(self):
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY, return_dict=True)
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY)
input_ids, mask = model.dummy_inputs["input_ids"], model.dummy_inputs["attention_mask"]
target_ids = torch.tensor([[0, 4, 8, 2], [0, 8, 2, 1]], dtype=torch.long, device=model.device)
decoder_input_ids = target_ids[:, :-1].contiguous() # Why this line?
Expand Down
2 changes: 1 addition & 1 deletion model_cards/microsoft/prophetnet-large-uncased/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ target_str = "us rejects charges against its ambassador in bolivia"
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
labels = tokenizer(target_str, return_tensors="pt").input_ids

loss = model(input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids, labels=labels).loss
```

### Citation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ target_str = "us rejects charges against its ambassador in bolivia"
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
labels = tokenizer(target_str, return_tensors="pt").input_ids

loss = model(input_ids, labels=labels, return_dict=True).loss
loss = model(input_ids, labels=labels).loss
```

Note that since this model is a multi-lingual model it can be fine-tuned on all kinds of other languages.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code', return_dict=True)
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')

inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length')
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
Expand Down
2 changes: 1 addition & 1 deletion model_cards/sentence-transformers/LaBSE/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ sentences = ["Hello World", "Hallo Welt"]
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')

with torch.no_grad():
model_output = model(**encoded_input, return_dict=True)
model_output = model(**encoded_input)

embeddings = model_output.pooler_output
embeddings = torch.nn.functional.normalize(embeddings)
Expand Down
2 changes: 1 addition & 1 deletion scripts/fsmt/fsmt-make-super-tiny-model.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@

# Test
batch = tokenizer.prepare_seq2seq_batch(["Making tiny model"])
outputs = tiny_model(**batch, return_dict=True)
outputs = tiny_model(**batch)

print("test output:", len(outputs.logits[0]))

Expand Down
2 changes: 1 addition & 1 deletion scripts/fsmt/fsmt-make-tiny-model.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

# Test
batch = tokenizer.prepare_seq2seq_batch(["Making tiny model"])
outputs = tiny_model(**batch, return_dict=True)
outputs = tiny_model(**batch)

print("test output:", len(outputs.logits[0]))

Expand Down
4 changes: 2 additions & 2 deletions src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ class PretrainedConfig(object):
Whether or not the model should returns all attentions.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return the last key/values attentions (not used by all models).
return_dict (:obj:`bool`, `optional`, defaults to :obj:`False`):
return_dict (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return a :class:`~transformers.file_utils.ModelOutput` instead of a plain
tuple.
is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Expand Down Expand Up @@ -163,7 +163,7 @@ class PretrainedConfig(object):

def __init__(self, **kwargs):
# Attributes with defaults
self.return_dict = kwargs.pop("return_dict", False)
self.return_dict = kwargs.pop("return_dict", True)
self.output_hidden_states = kwargs.pop("output_hidden_states", False)
self.output_attentions = kwargs.pop("output_attentions", False)
self.use_cache = kwargs.pop("use_cache", True) # Not used by all models
Expand Down
Loading

0 comments on commit 198cb8c

Please sign in to comment.