T5 finetune outputting gibberish #7796

jsrozner · 2020-10-14T21:09:04Z

Environment info

transformers version: 3.3.1
Platform: Linux-4.4.0-116-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.6.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: (tried with both 1 and 2 gpus)

Who can help

Summarization: @sshleifer
T5: @patrickvonplaten
examples/seq2seq: @sshleifer

Information

I am trying to finetune on a custom dataset. I posted about my specific use case here in the forums: https://discuss.huggingface.co/t/t5-tips-for-finetuning-on-crossword-clues-clue-answer/1514

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
[X ] my own task or dataset: (give details below)

To reproduce

clone transformers from master
pip install -e . ; pip install -r requirements.txt
cd exampls/seq2seq
modify finetune_t5.sh script to run with a local data set (data_set/[val|test|train].[source|target])

(Note that I have changed nothing else)

python finetune.py \ --model_name_or_path=t5-small \ --tokenizer_name=t5-small \ --data_dir=${HOME}/data_set \ --learning_rate=3e-4 \ --output_dir=$OUTPUT_DIR \ --max_source_length=100 \ --max_target_length=100 \ --num_train_epochs=300 \ --train_batch_size=64 \ --eval_batch_size=64 \ --gpus=1 \ --auto_select_gpus=True \ --save_top_k=3 \ --output_dir=$OUTPUT_DIR \ --do_train \ --do_predict \ "$@"

As a baseline "does the T5 work", my input outputs are of the form (one per line)
(this is one line in train.source): This is a sentence
(this is corresponding line in train.target): This

The lines are exactly as above, with a new line after each example, but with no other punctuation. I have not modified tokens or the model.

Expected behavior

Expect T5 to learn to output the first word.

Observed

T5 outputs first word followed by gibberish:

After 300 epochs, here is what we see for the first 5 lines of source vs test_generation (test.target is just the first word of each line in test.source)
Test.source:
We raised a bloom, a monster
I let Satan corrupt and torment
Chapter in play is an old piece
Old skin disease liable to drain confidence
Keep a riot going inside a musical academy

test_generations:
We vsahmoastuosastostassymbossa
Issahrastahmoormentostormentastoshomment
Chapter vshygie'ny-futtahraffahtaftast
Old hygienohmahrastassahuasairtia
Keep'astifiahuassaivrasastoshygiesana

I wonder if any of the following could be affecting this:

choice of loss function
a corrupted character somewhere in one of the input/output
choice of task (I think it defaults to summarization)
need more epochs?
some other parameter to change?

The text was updated successfully, but these errors were encountered:

sshleifer · 2020-10-14T21:14:05Z

some other parameter to change?: BINGO

there is a min_length/max_length parameter you can pass to beam search (in many ways) that is affecting your generations.
If you eval offline with min_length=0, max_length=3 it should work.

jsrozner · 2020-10-14T21:22:39Z

Cool! Sorry for the n00biness.

Is there somewhere I can read about when / why this happens? (or in brief, why does it happen?)
min_length and max_length will just limit how long the output sequence can be? Where's the best place to input them? Just directly from finetune.py?
Is there a different way to have the model learn when to stop outputting? (i.e to learn by itself that it should only be outputting one "word" since that's what all the train examples show)

sshleifer · 2020-10-14T22:09:25Z

you can read the docstring for generate
I would edit finetune.py around here
It should learn good lengths within the hardcoded range. It's simply not allowed to go out of the hardcoded range.
If you set min_length=0, max_length=10 I would guess it will learn to always generate word followed by </s> (This "eos" symbol is automatically added to input sequences by the T5Tokenizer.)

jsrozner · 2020-10-15T05:36:12Z

Thanks! I am rerunning with the max length (I didn't see a spot for min length).

I'm still a little confused as to why this happens though. For example,

why doesn't it get penalized for the gibberish? (is padding somehow affecting what it gets penalized for?)
why isn't the gibberish at all linguistic, even? I would expect it at least to add mostly english-like tokens? These strings seem entirely non-lingustic.

Related: is there an easy flag to change so that I could view part of the validation outputs at each epoch to keep track of when it learns to truncate? Right now I'm just waiting until end of training to look at the test generations.

sshleifer · 2020-10-15T14:05:24Z

You need the min_length, just pass min_length=0 to model.generate
re padding, yes. There is no loss for pad tokens.
no flag to see intermediate generations, but https://github.com/huggingface/transformers/blob/master/examples/seq2seq/callbacks.py#L83 should maybe work.

jsrozner · 2020-10-15T19:05:09Z

Okay thanks, I will work on these.

I realize these are unrelated T5 issues, but before I file other feature requests /bugs I just wanted to run them by you:

auto_lr_find and auto_scale_batch_size (pytorch lightning flags) when used from the finetune.sh script throw errors. Should these be usable? (I can debug and figure out why they're not working; but I want to know if they should be working)
I am unable to get the finetune.sh script to resume from a checkpoint (I played around with this for ~2 hours last night) and was unable to make it resume. Should this be supported?

sshleifer · 2020-10-15T20:00:25Z

auto*: Would be nice if they worked!
it should work with --resume_from_checkpoint, but that part of lightning has been very flaky.

I probably won't fix either of these but would definitely accept a PR that allow clargs that currently don't work. If you can't fix, you could also make separate issues for clargs that don't work, label them "Help Wanted" and see what happens.
If you make issues, make sure to include your PL version.

danyaljj · 2020-10-23T04:22:21Z

@jsrozner did you finetune.py work for fine-tuning T5?

We're also having some difficulties. Wanted to make sure if it has worked for someone else, at least.

jsrozner · 2020-11-10T16:45:41Z

@danyaljj will be fixed by #8435

danyaljj · 2020-11-10T18:16:05Z

Thanks, @jsrozner for the update!
Does this address the issue here? Mainly your observation that:

But even after setting eval_beams=1, eval_max_gen_length=40, it still continues to generate many more tokens than it should

sshleifer · 2020-11-10T18:53:10Z

Did you pass min_length=0 to generate?

jsrozner · 2020-11-10T22:29:49Z

See issue #5142 for resolution

sshleifer closed this as completed Oct 14, 2020

jsrozner mentioned this issue Nov 10, 2020

[T5 Tokenizer] Fix t5 special tokens #8435

Merged

5 tasks

jsrozner mentioned this issue Nov 10, 2020

T5 special tokens not mapped to unique indices in vocabulary #5142

Closed

danyaljj mentioned this issue Nov 12, 2020

[s2s] finetune.py: specifying generation min_length #8478

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5 finetune outputting gibberish #7796

T5 finetune outputting gibberish #7796

jsrozner commented Oct 14, 2020

sshleifer commented Oct 14, 2020

jsrozner commented Oct 14, 2020 •

edited

Loading

sshleifer commented Oct 14, 2020 •

edited

Loading

jsrozner commented Oct 15, 2020 •

edited

Loading

sshleifer commented Oct 15, 2020

jsrozner commented Oct 15, 2020 •

edited

Loading

sshleifer commented Oct 15, 2020 •

edited

Loading

danyaljj commented Oct 23, 2020

jsrozner commented Nov 10, 2020

danyaljj commented Nov 10, 2020

sshleifer commented Nov 10, 2020

jsrozner commented Nov 10, 2020

T5 finetune outputting gibberish #7796

T5 finetune outputting gibberish #7796

Comments

jsrozner commented Oct 14, 2020

Environment info

Who can help

Information

To reproduce

Expected behavior

Observed

sshleifer commented Oct 14, 2020

jsrozner commented Oct 14, 2020 • edited Loading

sshleifer commented Oct 14, 2020 • edited Loading

jsrozner commented Oct 15, 2020 • edited Loading

sshleifer commented Oct 15, 2020

jsrozner commented Oct 15, 2020 • edited Loading

sshleifer commented Oct 15, 2020 • edited Loading

danyaljj commented Oct 23, 2020

jsrozner commented Nov 10, 2020

danyaljj commented Nov 10, 2020

sshleifer commented Nov 10, 2020

jsrozner commented Nov 10, 2020

jsrozner commented Oct 14, 2020 •

edited

Loading

sshleifer commented Oct 14, 2020 •

edited

Loading

jsrozner commented Oct 15, 2020 •

edited

Loading

jsrozner commented Oct 15, 2020 •

edited

Loading

sshleifer commented Oct 15, 2020 •

edited

Loading