Issue: Adding new tokens to bert tokenizer in QA #11200

andreabac3 · 2021-04-12T10:23:12Z

WARNING: This issue is a replica of this other issue open by me, I ask you sorry if I have open it in the wrong place.

Hello Huggingface's team (@sgugger , @joeddav, @LysandreJik)
I have a problem with this code base
notebooks/examples/question_answering.ipynb - link
ENV: Google Colab - transformers Version: 4.5.0; datasets Version: 1.5.0; torch Version: 1.8.1+cu101;
I am trying to add some domain tokens in the bert-base-cased tokenizer

model_checkpoint = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
list_of_domain_tokens = ["token1", "token2", "token3"]
tokenizer.add_tokens(list_of_domain_tokens)
...
...
model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)
print(model.device)  # cpu
model.resize_token_embeddings(len(tokenizer))
trainer = Trainer(...)

Then during the trainer.fit() call it report the attached error.
Can you please tell me where I'm wrong?
The tokenizer output is the usual bert inputs expressed in the form of List[List[int]] eg inputs_ids and attention_mask.
So I can't figure out where the problem is with the device
Input, output and indices must be on the current device

Kind Regards,
Andrea

The text was updated successfully, but these errors were encountered:

sgugger · 2021-04-12T15:16:04Z

I am unable to reproduce: the notebook with your added code works smoothly on my side.

andreabac3 · 2021-04-12T16:34:25Z

Thank you @sgugger.
I want ask you sorry, I can't figure out on what's going on my side.
Now I have cloned again the notebook and the example works.
In the next days I want test it again and I will tell you more about it.

Thank you again for your help
Kind Regards,
Andrea

andreabac3 · 2021-04-16T07:36:11Z

Hi @sgugger,
I worked on the notebook and I found the problem.
I have not yet had the opportunity to test it with the original squad dataset but this happens to me both on colab and on my machine.
I warn you it seems an absurd and paradoxical situation, moreover I in no way manage the device.
I can provide you with a video while running the notebook.
As you can see from the screenshot I am forced to keep two versions of the training args, one original from the notebook and one customized by me.

If I perform these operations I get the error

I instantiate my training args
I instantiate the Trainer
I run trainer.fit
I get the error Input, output and indices must be on the current device

To solve I have to:
Instantiate the original training args of the notebook, instantiate the trainer, perform the fit to check that it has started and then do it all over again with the training args I customized.

Kind regards,
Andrea

andreabac3 · 2021-04-17T08:40:59Z

Hi @sgugger,
I can confirm, the same bug happens in the original notebook with this TrainingArguments (I have tested with squad v2), the temporary fix is to start the train with the original one, stop it and then run with the customized args.

sgugger · 2021-04-19T15:22:08Z

It looks like a bug in colab (from the screenshots I assume that is what you are using for training?) since I didn't get any error on my side by executing this as a notebook.

andreabac3 · 2021-04-19T15:37:47Z

Hi @sgugger
Do you have tested the notebook replacing the trainer args with the following?

args = TrainingArguments(
    f"my-experiment",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=250,
    num_train_epochs=2,
    weight_decay=0.01,
    fp16=True,
    gradient_accumulation_steps=2,
    eval_accumulation_steps=2,
    fp16_opt_level='O2',
    fp16_full_eval=True,
    save_strategy='epoch',
    metric_for_best_model='eval_loss',
    logging_strategy='epoch'
)

Because I encountered the same issue on my machine.
Can you kindly test with it? Please
To test it: remove the old trainer args use the attached one and run the trainer.fit

Kind regards,
Andrea

sgugger · 2021-04-19T15:56:46Z

Ah you're right, I must have made a mistake. This comes from the option fp16_full_eval=True.

@stas00 I'm not sure what the best place is for fixing this but if someone uses fp16_full_eval=True with training, the model is never sent to the proper device and training fails.

stas00 · 2021-04-19T16:36:51Z

But there is no do_train in the args at #11200 (comment)

The logic is very explicit to not place on the device only for non-train whenfp16_full_eval=True is used:

        if (
            self.is_model_parallel
            or (args.deepspeed and args.do_train)
            or (args.fp16_full_eval and not args.do_train)
            or (self.sharded_ddp in [ShardedDDPOption.ZERO_DP_2, ShardedDDPOption.ZERO_DP_3])
        ):
            self.place_model_on_device = False

You need to add do_train=True to your TrainingArguments, otherwise it defaults to eval only because you have evaluation_strategy set.

andreabac3 · 2021-04-19T16:46:38Z

Hi @stas00 & @sgugger,

You need to add do_train=True to your TrainingArguments, otherwise it defaults to eval only because you have evaluation_strategy set.

Ok so do_train=True is also compatible with fp16_full_eval=True?
My objective is to train the model and pick the best one at the lowest point of eval loss.

Regarding the notebook, can I use the same Trainer object for fit and predict? Because these Booleans are never set in the notebook. I mean when I am doing trainer.predict() is obvious for the trainer to set model.eval() and torch.no_grad()?

Thank you both,
Andrea

stas00 · 2021-04-19T16:52:15Z

Ok so do_train=True is also compatible with fp16_full_eval=True?

Why did you think it shouldn't be compatible?

The only reason there is a special case for non-training is to avoid placing the full model on device before it was half()'ed - as it might not fit in its full size, but might fit in half().

Regarding the notebook, can I use the same Trainer object for fit and predict? Because these Booleans are never set in the notebook. I mean when I am doing trainer.predict() is obvious for the trainer to set model.eval() and torch.no_grad()?

Of course. It was designed for you to pass all the init args at once and then you can call all its functions.

andreabac3 · 2021-04-19T16:58:46Z

@stas00 Ok clear, I have just checked and the trainer works perfectly.
What do you think to place a warning to alert the user when call trainer.fit having the trainer.do_train = False?

Because it's clear in the point of view of performance as you said but the documentation don't bring out this things for this reason I have open then issue.

Kind regards,
Andrea

stas00 · 2021-04-19T17:07:49Z

Oh, I see. Until recently do_train was sort of optional when using user's custom code and what you're saying we need to then require do_train=True if trainer.train() is called. But we started relying on do_train for more than just knowing to call train() from scripts. This makes sense to me.

@sgugger, do you agree if we add this?

    def train(...):
[...]
    if not self.args.do_train:
        raise ValueError("To use `train` please make sure you set `do_train=True` when instantiating the Trainer object")

sgugger · 2021-04-19T17:16:03Z

I would rather avoid adding this, as users have been used to not have to set that argument to True when not using example scripts. Can we just add the proper line in train to put the model on the device if it was not done already?

(Sorry I didn't catch you were using do_train in the PR you added that test, I should have caught it and commented there.)

stas00 · 2021-04-19T17:46:35Z

We will probably have to rethink the design then, since it's not a simple "put on device if it wasn't already" - there are multiple cases when it shouldn't happen. For now added a hardcoded workaround: #11322

andreabac3 mentioned this issue Apr 12, 2021

Issue: Adding new tokens to bert tokenizer in QA huggingface/notebooks#21

Open

andreabac3 mentioned this issue Apr 19, 2021

QA: fix the missing do_train param in trainer args huggingface/notebooks#24

Closed

stas00 mentioned this issue Apr 19, 2021

[Trainer] fix the placement on device with fp16_full_eval #11322

Merged

stas00 closed this as completed in #11322 Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: Adding new tokens to bert tokenizer in QA #11200

Issue: Adding new tokens to bert tokenizer in QA #11200

andreabac3 commented Apr 12, 2021 •

edited

Loading

sgugger commented Apr 12, 2021

andreabac3 commented Apr 12, 2021

andreabac3 commented Apr 16, 2021

andreabac3 commented Apr 17, 2021

sgugger commented Apr 19, 2021

andreabac3 commented Apr 19, 2021 •

edited

Loading

sgugger commented Apr 19, 2021

stas00 commented Apr 19, 2021 •

edited

Loading

andreabac3 commented Apr 19, 2021

stas00 commented Apr 19, 2021

andreabac3 commented Apr 19, 2021

stas00 commented Apr 19, 2021 •

edited

Loading

sgugger commented Apr 19, 2021

stas00 commented Apr 19, 2021 •

edited

Loading

Issue: Adding new tokens to bert tokenizer in QA #11200

Issue: Adding new tokens to bert tokenizer in QA #11200

Comments

andreabac3 commented Apr 12, 2021 • edited Loading

sgugger commented Apr 12, 2021

andreabac3 commented Apr 12, 2021

andreabac3 commented Apr 16, 2021

andreabac3 commented Apr 17, 2021

sgugger commented Apr 19, 2021

andreabac3 commented Apr 19, 2021 • edited Loading

sgugger commented Apr 19, 2021

stas00 commented Apr 19, 2021 • edited Loading

andreabac3 commented Apr 19, 2021

stas00 commented Apr 19, 2021

andreabac3 commented Apr 19, 2021

stas00 commented Apr 19, 2021 • edited Loading

sgugger commented Apr 19, 2021

stas00 commented Apr 19, 2021 • edited Loading

andreabac3 commented Apr 12, 2021 •

edited

Loading

andreabac3 commented Apr 19, 2021 •

edited

Loading

stas00 commented Apr 19, 2021 •

edited

Loading

stas00 commented Apr 19, 2021 •

edited

Loading

stas00 commented Apr 19, 2021 •

edited

Loading