Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugs in "textattack train” #488

Closed
qiyanjun opened this issue Jul 23, 2021 · 6 comments · Fixed by #653
Closed

bugs in "textattack train” #488

qiyanjun opened this issue Jul 23, 2021 · 6 comments · Fixed by #653
Labels
bug Something isn't working

Comments

@qiyanjun
Copy link
Member

qiyanjun commented Jul 23, 2021

  1. .. training issues:
    e.g.
textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5 

:/server/usera/TextAttack ; textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5 

Iteration:   0%|                                                                         | 0/68769 [00:00<?, ?it/s]/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [2,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [4,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:59: ClassNLLCriterion_updateOutput_no_reduce_kernel: block: [0,0,0], thread: [7,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
....
/miniconda3/envs/textattackenv-master/lib/python3.7/site-packages/textattack/trainer.py", line 505, in training_step
    sample_weights[is_adv_sample] *= self.training_args.alpha
RuntimeError: CUDA error: device-side assert triggered


  1. model loading issue, e.g.,
textattack attack --model t5-en-de --recipe seq2sick --num-examples 10


/server/usera/TextAttack ;  textattack attack --model t5-en-de --recipe seq2sick --num-examples 10 
2021-07-22 17:26:22.554522: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Downloading: 5.42kB [00:00, 1.63MB/s]                                                                
Downloading: 3.10kB [00:00, 995kB/s]                                                                 
Downloading and preparing dataset ted_multi_translate/plain_text (download: 335.91 MiB, generated: 754.37 MiB, post-processed: Unknown size, total: 1.06 GiB) to /server/usera/.cache/huggingface/datasets/ted_multi_translate/plain_text/1.0.0/60acc1806272a2861722952467c3b4ac274f1a41d798f7c64b7e36cb590dfc48...
Dataset ted_multi_translate downloaded and prepared to /server/usera/.cache/huggingface/datasets/ted_multi_translate/plain_text/1.0.0/60acc1806272a2861722952467c3b4ac274f1a41d798f7c64b7e36cb590dfc48. Subsequent calls will reuse this data.
Traceback (most recent call last):
  File "/server/usera/miniconda3/envs/textattackenv/bin/textattack", line 33, in <module>
    sys.exit(load_entry_point('textattack', 'console_scripts', 'textattack')())
  File "/net/server/usera/TextAttack/textattack/commands/textattack_cli.py", line 42, in main
    func.run(args)
  File "/net/server/usera/TextAttack/textattack/commands/attack_command.py", line 24, in run
    model_wrapper = ModelArgs._create_model_from_args(attack_args)
  File "/net/server/usera/TextAttack/textattack/model_args.py", line 252, in _create_model_from_args
    model, model.tokenizer
  File "/net/server/usera/TextAttack/textattack/models/wrappers/huggingface_model_wrapper.py", line 20, in __init__
    ), f"`model` must be of type `transformers.PreTrainedModel`, but got type {type(model)}."
AssertionError: `model` must be of type `transformers.PreTrainedModel`, but got type <class 'textattack.models.helpers.t5_for_text_to_text.T5ForTextToText'>.
(textattackenv) 
@qiyanjun qiyanjun added bug Something isn't working 🚨 URGENT! 🚨 labels Jul 23, 2021
@qiyanjun qiyanjun changed the title Many of the CLI funcs (e.g. those in README.md) do not work now Many of the CLI funcs (e.g. those in README.md and examples ) do not work now Jul 30, 2021
@qiyanjun qiyanjun changed the title Many of the CLI funcs (e.g. those in README.md and examples ) do not work now multiple CLI funcs (e.g. those in README.md and examples ) do not work now Aug 2, 2021
@qiyanjun qiyanjun changed the title multiple CLI funcs (e.g. those in README.md and examples ) do not work now bugs in "textattack train” Aug 3, 2021
@srujanjoshi
Copy link
Contributor

I tried running the above commands on my Machine and observed the same errors. (Ubuntu 20.04, Python 3.8.11, Tensorflow 2.6, PyTorch 1.9.1)

@VijayKalmath
Copy link
Contributor

VijayKalmath commented Jun 2, 2022

@qiyanjun

For the first Issue with the snli dataset.

From HuggingFace SNLI Dataset page , regarding the snli labels .

label: an integer whose value may be either 0, indicating that the hypothesis entails the premise, 1, indicating that the premise and hypothesis neither entail nor contradict each other, or 2, indicating that the hypothesis contradicts the premise. Dataset instances which don't have any gold label are marked with -1 label. Make sure you filter them before starting the training using datasets.Dataset.filter.

Since the SNLI labels have four possible values [ -1 , 0 , 1 , 2 ] , we need to first filter -1 out of the dataset and then set --model-num-labels to 3.

so the textattack command to run should be

textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 / 
                                            --learning-rate 1e-5  --model-num-labels 3  --filter-train-by-labels 0 1 2

The model trains without errors with the new arguments.

One potential enhancement would be that in training_args.py we could add checks to indicate the user if model-num-labels does not equal to the labels in the dataset or if the dataset contains numeric labels < 0 which are not supported by pytorch.

Do let me know if the enhancement looks good , I can go ahead and open a PR for the same.

@VijayKalmath
Copy link
Contributor

@qiyanjun

For the Second Issue with the T5 Models,

T5 Models have their own wrappers for the model and tokenizer defined in textattack/models/helpers/t5_for_text_to_text.py and textattack/models/tokenizers/t5_tokenizer.py respectively.

In textattack/models/wrappers/huggingface_model_wrapper.py , we have assert statements which check for the model's class.

These Assert Statements need to be updated with classes respective to t5_for_text_to_text and t5_tokenizer .

Current Assert Statements

        assert isinstance(
            model, transformers.PreTrainedModel
        ), f"`model` must be of type `transformers.PreTrainedModel`, but got type {type(model)}."
        assert isinstance(
            tokenizer,
            (transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast),
        ), f"`tokenizer` must of type `transformers.PreTrainedTokenizer` or `transformers.PreTrainedTokenizerFast`, but got type {type(tokenizer)}."

These Assert Statements should be updated to

from textattack.models.tokenizers import T5Tokenizer
from textattack.models.helpers import T5ForTextToText


        assert isinstance(
            model, (transformers.PreTrainedModel,T5ForTextToText)
        ), f"`model` must be of type `transformers.PreTrainedModel`  or `T5ForTextToText` , but got type {type(model)}."
        assert isinstance(
            tokenizer,
            (transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast,T5Tokenizer),
        ), f"`tokenizer` must of type `transformers.PreTrainedTokenizer` or `transformers.PreTrainedTokenizerFast` or `T5Tokenizer`, but got type {type(tokenizer)}."

@jxmorris12
Do let me know if the changes specified looks good and the previously mentioned enhancement looks good , I can go ahead and open a PR for the same.

@jxmorris12
Copy link
Collaborator

Hey @VijayKalmath.

The changes to T5 look great.

For the SNLI case, can you we a check to make sure all the labels are accounted for, and throw an AssertionError if there are labels unaccounted for? That way we won't get the mysterious "CUDA device-side assert triggered" error.

@VijayKalmath
Copy link
Contributor

@jxmorris12 Thank you for confirming the T5 changes.

Can you please elaborate more on what you mean by to make sure all the labels are accounted for?

@jxmorris12
Copy link
Collaborator

She ran this command:
textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5
and got this error:
RuntimeError: CUDA error: device-side assert triggered

Instead, we should check the number of labels, and if it's not right through AssertionError: expected 3 labels, got 4 in this case. And also maybe AssertionError: label -1 not one of [0, 1, 2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants