Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarisation example fails to run on given example. Missing positional argument TypeError #18381

Closed
2 of 4 tasks
SupreethRao99 opened this issue Jul 31, 2022 · 5 comments · Fixed by #18398
Closed
2 of 4 tasks
Labels

Comments

@SupreethRao99
Copy link

System Info

- `transformers` version: 4.21.0
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu113 (True)
- Tensorflow version (GPU?): 2.8.2 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Who can help?

@sgugger @Pati

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am trying to fine tune my own summarisation model based on the example in transformers/examples/pytorch/summarization/run_summarization_no_trainer.py but it when I first tried on the example given in the repository. link to Google Colab to reproduce error

!accelerate launch /content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py \
    --model_name_or_path t5-small \
    --dataset_name cnn_dailymail \
    --dataset_config "3.0.0" \
    --source_prefix "summarize: " \
    --output_dir ~/tmp/tst-summarization

I'm getting the following error

Traceback (most recent call last):
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 763, in <module>
    main()
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 493, in main
    desc="Running tokenizer on dataset",
  File "/usr/local/lib/python3.7/dist-packages/datasets/dataset_dict.py", line 790, in map
    for k, dataset in self.items()
  File "/usr/local/lib/python3.7/dist-packages/datasets/dataset_dict.py", line 790, in <dictcomp>
    for k, dataset in self.items()
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2405, in map
    desc=desc,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 557, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 524, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 480, in wrapper
    out = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2779, in _map_single
    offset=offset,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2655, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 2347, in decorated
    result = f(decorated_item, *args, **kwargs)
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 474, in preprocess_function
    labels = tokenizer(text_target=targets, max_length=max_target_length, padding=padding, truncation=True)
TypeError: __call__() missing 1 required positional argument: 'text'
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 826, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 358, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py', '--model_name_or_path', 't5-small', '--dataset_name', 'cnn_dailymail', '--dataset_config', '3.0.0', '--source_prefix', 'summarize: ', '--output_dir', '/root/tmp/tst-summarization']' returned non-zero exit status 1.

Expected behavior

The model should start training

@LysandreJik
Copy link
Member

Aha, that's one for @sgugger, linked to #18325

@sgugger
Copy link
Collaborator

sgugger commented Aug 1, 2022

You need to use the main version of Transformers to use the main version of the example scripts. You can find the examples for v4.21.0 here.

@SupreethRao99
Copy link
Author

Thank you @sgugger @LysandreJik , it works perfectly now

@SupreethRao99
Copy link
Author

SupreethRao99 commented Aug 1, 2022

hey, sorry to bother you again @sgugger , but, this is the output I'm getting when I'm running the script on my own dataset

All the weights of BartForConditionalGeneration were initialized from the model checkpoint at ainize/bart-base-cnn.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BartForConditionalGeneration for predictions without further training.
Running tokenizer on dataset: 100% 1/1 [00:00<00:00, 215.96ba/s]
Running tokenizer on dataset: 100% 1/1 [00:00<00:00, 342.76ba/s]
08/01/2022 12:58:50 - INFO - __main__ - Sample 27 of the training set: {'input_ids': [0, 6323, 34638, 251, 2788, 2], 'attention_mask': [1, 1, 1, 1, 1, 1], 'labels': [0, 12465, 765, 2788, 2]}.
08/01/2022 12:58:52 - INFO - __main__ - ***** Running training *****
08/01/2022 12:58:52 - INFO - __main__ -   Num examples = 32
08/01/2022 12:58:52 - INFO - __main__ -   Num Epochs = 3
08/01/2022 12:58:52 - INFO - __main__ -   Instantaneous batch size per device = 8
08/01/2022 12:58:52 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
08/01/2022 12:58:52 - INFO - __main__ -   Gradient Accumulation steps = 1
08/01/2022 12:58:52 - INFO - __main__ -   Total optimization steps = 12
 33% 4/12 [00:01<00:01,  4.60it/s]08/01/2022 12:58:54 - INFO - absl - Using default tokenizer.
Traceback (most recent call last):
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 764, in <module>
    main()
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 711, in main
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
  File "/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py", line 711, in <dictcomp>
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
AttributeError: 'numpy.float64' object has no attribute 'mid'
 33% 4/12 [00:01<00:03,  2.11it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 826, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 358, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py', '--model_name_or_path', 'ainize/bart-base-cnn', '--train_file', '/content/test.csv', '--validation_file', '/content/test.csv', '--summary_column', 'Summary', '--text_column', 'Text', '--output_dir', '/content/model']' returned non-zero exit status 1.

The code I'm using to launch the script is

!accelerate launch /content/transformers/examples/pytorch/summarization/run_summarization_no_trainer.py \
  --model_name_or_path ainize/bart-base-cnn \
  --train_file /content/test.csv \
  --validation_file /content/test.csv \
  --summary_column Summary \
  --text_column Text \
  --output_dir /content/model

the test.csv file is below
test.csv

@sgugger
Copy link
Collaborator

sgugger commented Aug 1, 2022

Yes, it looks like evaluate decided to break the rouge metric. Sending a fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants