bugs in "textattack train” #488

qiyanjun · 2021-07-23T17:30:44Z

.. training issues:
e.g.

textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5 

:/server/usera/TextAttack ; textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5 

Iteration:   0%|                                                                         | 0/68769 [00:00<?, ?it/s]/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [2,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [4,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [7,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
....
/miniconda3/envs/textattackenv-master/lib/python3.7/site-packages/textattack/trainer.py", line 505, in training_step
    sample_weights[is_adv_sample] *= self.training_args.alpha
RuntimeError: CUDA error: device-side assert triggered

model loading issue, e.g.,

textattack attack --model t5-en-de --recipe seq2sick --num-examples 10


/server/usera/TextAttack ;  textattack attack --model t5-en-de --recipe seq2sick --num-examples 10 
2021-07-22 17:26:22.554522: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Downloading: 5.42kB [00:00, 1.63MB/s]                                                                
Downloading: 3.10kB [00:00, 995kB/s]                                                                 
Downloading and preparing dataset ted_multi_translate/plain_text (download: 335.91 MiB, generated: 754.37 MiB, post-processed: Unknown size, total: 1.06 GiB) to /server/usera/.cache/huggingface/datasets/ted_multi_translate/plain_text/1.0.0/60acc1806272a2861722952467c3b4ac274f1a41d798f7c64b7e36cb590dfc48...
Dataset ted_multi_translate downloaded and prepared to /server/usera/.cache/huggingface/datasets/ted_multi_translate/plain_text/1.0.0/60acc1806272a2861722952467c3b4ac274f1a41d798f7c64b7e36cb590dfc48. Subsequent calls will reuse this data.
Traceback (most recent call last):
  File "/server/usera/miniconda3/envs/textattackenv/bin/textattack", line 33, in <module>
    sys.exit(load_entry_point('textattack', 'console_scripts', 'textattack')())
  File "/net/server/usera/TextAttack/textattack/commands/textattack_cli.py", line 42, in main
    func.run(args)
  File "/net/server/usera/TextAttack/textattack/commands/attack_command.py", line 24, in run
    model_wrapper = ModelArgs._create_model_from_args(attack_args)
  File "/net/server/usera/TextAttack/textattack/model_args.py", line 252, in _create_model_from_args
    model, model.tokenizer
  File "/net/server/usera/TextAttack/textattack/models/wrappers/huggingface_model_wrapper.py", line 20, in __init__
    ), f"`model` must be of type `transformers.PreTrainedModel`, but got type {type(model)}."
AssertionError: `model` must be of type `transformers.PreTrainedModel`, but got type <class 'textattack.models.helpers.t5_for_text_to_text.T5ForTextToText'>.
(textattackenv)

The text was updated successfully, but these errors were encountered:

srujanjoshi · 2021-10-28T17:40:22Z

I tried running the above commands on my Machine and observed the same errors. (Ubuntu 20.04, Python 3.8.11, Tensorflow 2.6, PyTorch 1.9.1)

VijayKalmath · 2022-06-02T23:24:20Z

@qiyanjun

For the first Issue with the snli dataset.

From HuggingFace SNLI Dataset page , regarding the snli labels .

label: an integer whose value may be either 0, indicating that the hypothesis entails the premise, 1, indicating that the premise and hypothesis neither entail nor contradict each other, or 2, indicating that the hypothesis contradicts the premise. Dataset instances which don't have any gold label are marked with -1 label. Make sure you filter them before starting the training using datasets.Dataset.filter.

Since the SNLI labels have four possible values [ -1 , 0 , 1 , 2 ] , we need to first filter -1 out of the dataset and then set --model-num-labels to 3.

so the textattack command to run should be

textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 / 
                                            --learning-rate 1e-5  --model-num-labels 3  --filter-train-by-labels 0 1 2

The model trains without errors with the new arguments.

One potential enhancement would be that in training_args.py we could add checks to indicate the user if model-num-labels does not equal to the labels in the dataset or if the dataset contains numeric labels < 0 which are not supported by pytorch.

Do let me know if the enhancement looks good , I can go ahead and open a PR for the same.

VijayKalmath · 2022-06-02T23:33:23Z

@qiyanjun

For the Second Issue with the T5 Models,

T5 Models have their own wrappers for the model and tokenizer defined in textattack/models/helpers/t5_for_text_to_text.py and textattack/models/tokenizers/t5_tokenizer.py respectively.

In textattack/models/wrappers/huggingface_model_wrapper.py , we have assert statements which check for the model's class.

These Assert Statements need to be updated with classes respective to t5_for_text_to_text and t5_tokenizer .

Current Assert Statements

        assert isinstance(
            model, transformers.PreTrainedModel
        ), f"`model` must be of type `transformers.PreTrainedModel`, but got type {type(model)}."
        assert isinstance(
            tokenizer,
            (transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast),
        ), f"`tokenizer` must of type `transformers.PreTrainedTokenizer` or `transformers.PreTrainedTokenizerFast`, but got type {type(tokenizer)}."

These Assert Statements should be updated to

from textattack.models.tokenizers import T5Tokenizer
from textattack.models.helpers import T5ForTextToText


        assert isinstance(
            model, (transformers.PreTrainedModel,T5ForTextToText)
        ), f"`model` must be of type `transformers.PreTrainedModel`  or `T5ForTextToText` , but got type {type(model)}."
        assert isinstance(
            tokenizer,
            (transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast,T5Tokenizer),
        ), f"`tokenizer` must of type `transformers.PreTrainedTokenizer` or `transformers.PreTrainedTokenizerFast` or `T5Tokenizer`, but got type {type(tokenizer)}."

@jxmorris12
Do let me know if the changes specified looks good and the previously mentioned enhancement looks good , I can go ahead and open a PR for the same.

jxmorris12 · 2022-06-03T00:57:34Z

Hey @VijayKalmath.

The changes to T5 look great.

For the SNLI case, can you we a check to make sure all the labels are accounted for, and throw an AssertionError if there are labels unaccounted for? That way we won't get the mysterious "CUDA device-side assert triggered" error.

VijayKalmath · 2022-06-03T04:15:40Z

@jxmorris12 Thank you for confirming the T5 changes.

Can you please elaborate more on what you mean by to make sure all the labels are accounted for?

jxmorris12 · 2022-06-03T13:49:46Z

She ran this command:
textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5
and got this error:
RuntimeError: CUDA error: device-side assert triggered

Instead, we should check the number of labels, and if it's not right through AssertionError: expected 3 labels, got 4 in this case. And also maybe AssertionError: label -1 not one of [0, 1, 2]

qiyanjun added bug Something isn't working 🚨 URGENT! 🚨 labels Jul 23, 2021

qiyanjun changed the title ~~Many of the CLI funcs (e.g. those in README.md) do not work now~~ Many of the CLI funcs (e.g. those in README.md and examples ) do not work now Jul 30, 2021

qiyanjun changed the title ~~Many of the CLI funcs (e.g. those in README.md and examples ) do not work now~~ multiple CLI funcs (e.g. those in README.md and examples ) do not work now Aug 2, 2021

qiyanjun changed the title ~~multiple CLI funcs (e.g. those in README.md and examples ) do not work now~~ bugs in "textattack train” Aug 3, 2021

qiyanjun removed the 🚨 URGENT! 🚨 label Aug 3, 2021

VijayKalmath mentioned this issue Jun 3, 2022

Fix training issues 488 #653

Merged

5 tasks

jxmorris12 closed this as completed in #653 Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugs in "textattack train” #488

bugs in "textattack train” #488

qiyanjun commented Jul 23, 2021 •

edited

Loading

srujanjoshi commented Oct 28, 2021

VijayKalmath commented Jun 2, 2022 •

edited

Loading

VijayKalmath commented Jun 2, 2022

jxmorris12 commented Jun 3, 2022

VijayKalmath commented Jun 3, 2022

jxmorris12 commented Jun 3, 2022

bugs in "textattack train” #488

bugs in "textattack train” #488

Comments

qiyanjun commented Jul 23, 2021 • edited Loading

srujanjoshi commented Oct 28, 2021

VijayKalmath commented Jun 2, 2022 • edited Loading

VijayKalmath commented Jun 2, 2022

jxmorris12 commented Jun 3, 2022

VijayKalmath commented Jun 3, 2022

jxmorris12 commented Jun 3, 2022

qiyanjun commented Jul 23, 2021 •

edited

Loading

VijayKalmath commented Jun 2, 2022 •

edited

Loading