Summarisation example fails to run on given example. Missing positional argument TypeError #18381

SupreethRao99 · 2022-07-31T16:40:48Z

System Info

- `transformers` version: 4.21.0
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu113 (True)
- Tensorflow version (GPU?): 2.8.2 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Who can help?

@sgugger @Pati

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I am trying to fine tune my own summarisation model based on the example in transformers/examples/pytorch/summarization/run_summarization_no_trainer.py but it when I first tried on the example given in the repository. link to Google Colab to reproduce error

!accelerate launch /content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py \
    --model_name_or_path t5-small \
    --dataset_name cnn_dailymail \
    --dataset_config "3.0.0" \
    --source_prefix "summarize: " \
    --output_dir ~/tmp/tst-summarization

I'm getting the following error

Traceback (most recent call last):
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 763, in <module>
    main()
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 493, in main
    desc="Running tokenizer on dataset",
  File "/usr/local/lib/python3.7/dist-packages/datasets/dataset_dict.py", line 790, in map
    for k, dataset in self.items()
  File "/usr/local/lib/python3.7/dist-packages/datasets/dataset_dict.py", line 790, in <dictcomp>
    for k, dataset in self.items()
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2405, in map
    desc=desc,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 557, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 524, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 480, in wrapper
    out = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2779, in _map_single
    offset=offset,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2655, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2347, in decorated
    result = f(decorated_item, *args, **kwargs)
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 474, in preprocess_function
    labels = tokenizer(text_target=targets, max_length=max_target_length, padding=padding, truncation=True)
TypeError: __call__() missing 1 required positional argument: 'text'
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 826, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 358, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py', '--model_name_or_path', 't5-small', '--dataset_name', 'cnn_dailymail', '--dataset_config', '3.0.0', '--source_prefix', 'summarize: ', '--output_dir', '/root/tmp/tst-summarization']' returned non-zero exit status 1.

Expected behavior

The model should start training

The text was updated successfully, but these errors were encountered:

LysandreJik · 2022-08-01T09:40:45Z

Aha, that's one for @sgugger, linked to #18325

sgugger · 2022-08-01T11:54:32Z

You need to use the main version of Transformers to use the main version of the example scripts. You can find the examples for v4.21.0 here.

SupreethRao99 · 2022-08-01T12:40:55Z

Thank you @sgugger @LysandreJik , it works perfectly now

SupreethRao99 · 2022-08-01T13:02:54Z

hey, sorry to bother you again @sgugger , but, this is the output I'm getting when I'm running the script on my own dataset

All the weights of BartForConditionalGeneration were initialized from the model checkpoint at ainize/bart-base-cnn.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BartForConditionalGeneration for predictions without further training.
Running tokenizer on dataset: 100% 1/1 [00:00<00:00, 215.96ba/s]
Running tokenizer on dataset: 100% 1/1 [00:00<00:00, 342.76ba/s]
08/01/2022 12:58:50 - INFO - __main__ - Sample 27 of the training set: {'input_ids': [0, 6323, 34638, 251, 2788, 2], 'attention_mask': [1, 1, 1, 1, 1, 1], 'labels': [0, 12465, 765, 2788, 2]}.
08/01/2022 12:58:52 - INFO - __main__ - ***** Running training *****
08/01/2022 12:58:52 - INFO - __main__ -   Num examples = 32
08/01/2022 12:58:52 - INFO - __main__ -   Num Epochs = 3
08/01/2022 12:58:52 - INFO - __main__ -   Instantaneous batch size per device = 8
08/01/2022 12:58:52 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
08/01/2022 12:58:52 - INFO - __main__ -   Gradient Accumulation steps = 1
08/01/2022 12:58:52 - INFO - __main__ -   Total optimization steps = 12
 33% 4/12 [00:01<00:01,  4.60it/s]08/01/2022 12:58:54 - INFO - absl - Using default tokenizer.
Traceback (most recent call last):
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 764, in <module>
    main()
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 711, in main
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 711, in <dictcomp>
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
AttributeError: 'numpy.float64' object has no attribute 'mid'
 33% 4/12 [00:01<00:03,  2.11it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 826, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 358, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py', '--model_name_or_path', 'ainize/bart-base-cnn', '--train_file', '/content/test.csv', '--validation_file', '/content/test.csv', '--summary_column', 'Summary', '--text_column', 'Text', '--output_dir', '/content/model']' returned non-zero exit status 1.

The code I'm using to launch the script is

!accelerate launch /content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py \
  --model_name_or_path ainize/bart-base-cnn \
  --train_file /content/test.csv \
  --validation_file /content/test.csv \
  --summary_column Summary \
  --text_column Text \
  --output_dir /content/model

the test.csv file is below
test.csv

sgugger · 2022-08-01T13:53:02Z

Yes, it looks like evaluate decided to break the rouge metric. Sending a fix!

SupreethRao99 added the bug label Jul 31, 2022

SupreethRao99 closed this as completed Aug 1, 2022

SupreethRao99 reopened this Aug 1, 2022

sgugger mentioned this issue Aug 1, 2022

Fix ROUGE add example check and update README #18398

Merged

sgugger closed this as completed in #18398 Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summarisation example fails to run on given example. Missing positional argument TypeError #18381

Summarisation example fails to run on given example. Missing positional argument TypeError #18381

SupreethRao99 commented Jul 31, 2022

LysandreJik commented Aug 1, 2022

sgugger commented Aug 1, 2022

SupreethRao99 commented Aug 1, 2022

SupreethRao99 commented Aug 1, 2022 •

edited

Loading

sgugger commented Aug 1, 2022

Summarisation example fails to run on given example. Missing positional argument TypeError #18381

Summarisation example fails to run on given example. Missing positional argument TypeError #18381

Comments

SupreethRao99 commented Jul 31, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Aug 1, 2022

sgugger commented Aug 1, 2022

SupreethRao99 commented Aug 1, 2022

SupreethRao99 commented Aug 1, 2022 • edited Loading

sgugger commented Aug 1, 2022

SupreethRao99 commented Aug 1, 2022 •

edited

Loading