-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion srcIndex < srcSelectDimSize
failed
#24698
Comments
Hi @MaggieK410, thanks for reporting this issue. This is typically caused by an indexing issue in the code. Could you follow the issue template and:
|
Hi, thank you very much for getting back to me! I have made a mistake when initializing the tokenizer (I added tokens withoud resizing the embedding). As it is solved, I will close this issue. |
May I know how you solve the problem? Thank you very much in advance! |
In another part of the code I added a token but did not change the embedding size, which lead to the issue above. Since I did not need that token, I just removed that line and the code worked, but if you need to add the token, maybe look into changing your embeddings (https://stackoverflow.com/questions/72775559/resize-token-embeddings-on-the-a-pertrained-model-with-different-embedding-size) |
Hi,
I am running medalpaca (but the error seems to come from llama) on 4 GPUs using device map="auto" and the SFTTrainer and want to prompt tune the model. I have written a custom Dataset class:
class DiagnosesDataset(torch.utils.data.Dataset):
def init(self, instances, tokenizer):
self.instances=instances
self.tokenizer=tokenizer
The Training Arguments and Peft config:
training_arguments=TrainingArguments(
output_dir="./falcon_output_dir",
per_device_train_batch_size=4,
gradient_accumulation_steps=2,
optim="paged_adamw_32bit",
save_steps=100,
logging_steps=10,
learning_rate=2e-4,
max_steps=10000,
fp16=False,
bf16=False,
lr_scheduler_type="constant",
warmup_ratio=0.03,
group_by_length=True,
remove_unused_columns=False)
peft_config=LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=4,
bias="none",
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "v_proj"])
The SFTTrainer I am using looks like this:
However, when running the model, somewhere there seems to be an issue with some indices (https://discuss.pytorch.org/t/solved-assertion-srcindex-srcselectdimsize-failed-on-gpu-for-torch-cat/1804/27)
The error I am getting is this:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/students/kulcsar/Bachelor/for_dataset/10000_diagnoses/falcon_model_pef │
│ t.py:544 in │
│ │
│ 541 │ │
│ 542 │ │
│ 543 │ args=parser.parse_args() │
│ ❱ 544 │ run() │
│ 545 │ #main() │
│ 546 │ │
│ 547 │ #all_data, prompts, golds=preprocess("./dataset.pkl") │
│ │
│ /home/students/kulcsar/Bachelor/for_dataset/10000_diagnoses/falcon_model_pef │
│ t.py:153 in run │
│ │
│ 150 │ │ packing=True, │
│ 151 │ │ data_collator=DataCollatorForSeq2Seq(tokenizer, pad_to_multipl │
│ 152 │ │ args=training_arguments) │
│ ❱ 153 │ trainer.train() │
│ 154 │ │
│ 155 │ logging.info("Run Train loop") │
│ 156 │ #model_updated=train(model, dataset, args.seed, args.batch_size, a │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/transformers/trainer.py:1537 in train │
│ │
│ 1534 │ │ inner_training_loop = find_executable_batch_size( │
│ 1535 │ │ │ self._inner_training_loop, self._train_batch_size, args.a │
│ 1536 │ │ ) │
│ ❱ 1537 │ │ return inner_training_loop( │
│ 1538 │ │ │ args=args, │
│ 1539 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1540 │ │ │ trial=trial, │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/transformers/trainer.py:1802 in _inner_training_loop │
│ │
│ 1799 │ │ │ │ │ self.control = self.callback_handler.on_step_begi │
│ 1800 │ │ │ │ │
│ 1801 │ │ │ │ with self.accelerator.accumulate(model): │
│ ❱ 1802 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1803 │ │ │ │ │
│ 1804 │ │ │ │ if ( │
│ 1805 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/transformers/trainer.py:2647 in training_step │
│ │
│ 2644 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device │
│ 2645 │ │ │
│ 2646 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2647 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2648 │ │ │
│ 2649 │ │ if self.args.n_gpu > 1: │
│ 2650 │ │ │ loss = loss.mean() # mean() to average on multi-gpu para │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/transformers/trainer.py:2672 in compute_loss │
│ │
│ 2669 │ │ │ labels = inputs.pop("labels") │
│ 2670 │ │ else: │
│ 2671 │ │ │ labels = None │
│ ❱ 2672 │ │ outputs = model(**inputs) │
│ 2673 │ │ # Save past state if it exists │
│ 2674 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2675 │ │ if self.args.past_index >= 0: │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1502 in _wrapped_call_impl │
│ │
│ 1499 │ │ if self._compiled_call_impl is not None: │
│ 1500 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: │
│ 1501 │ │ else: │
│ ❱ 1502 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1503 │ │
│ 1504 │ def _call_impl(self, *args, **kwargs): │
│ 1505 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_s │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1511 in _call_impl │
│ │
│ 1508 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1509 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1510 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1511 │ │ │ return forward_call(*args, **kwargs) │
│ 1512 │ │ # Do not call functions when jit is used │
│ 1513 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1514 │ │ backward_pre_hooks = [] │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/peft/peft_model.py:739 in forward │
│ │
│ 736 │ ): │
│ 737 │ │ peft_config = self.active_peft_config │
│ 738 │ │ if not isinstance(peft_config, PromptLearningConfig): │
│ ❱ 739 │ │ │ return self.base_model( │
│ 740 │ │ │ │ input_ids=input_ids, │
│ 741 │ │ │ │ attention_mask=attention_mask, │
│ 742 │ │ │ │ inputs_embeds=inputs_embeds, │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1502 in _wrapped_call_impl │
│ │
│ 1499 │ │ if self._compiled_call_impl is not None: │
│ 1500 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: │
│ 1501 │ │ else: │
│ ❱ 1502 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1503 │ │
│ 1504 │ def _call_impl(self, *args, **kwargs): │
│ 1505 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_s │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1511 in _call_impl │
│ │
│ 1508 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1509 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1510 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1511 │ │ │ return forward_call(*args, **kwargs) │
│ 1512 │ │ # Do not call functions when jit is used │
│ 1513 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1514 │ │ backward_pre_hooks = [] │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module.hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/transformers/models/llama/modeling_llama.py:691 in │
│ forward │
│ │
│ 688 │ │ return_dict = return_dict if return_dict is not None else self │
│ 689 │ │ │
│ 690 │ │ # decoder outputs consists of (dec_features, layer_state, dec │
│ ❱ 691 │ │ outputs = self.model( │
│ 692 │ │ │ input_ids=input_ids, │
│ 693 │ │ │ attention_mask=attention_mask, │
│ 694 │ │ │ position_ids=position_ids, │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1502 in _wrapped_call_impl │
│ │
│ 1499 │ │ if self._compiled_call_impl is not None: │
│ 1500 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: │
│ 1501 │ │ else: │
│ ❱ 1502 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1503 │ │
│ 1504 │ def _call_impl(self, *args, **kwargs): │
│ 1505 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_s │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1511 in _call_impl │
│ │
│ 1508 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1509 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1510 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1511 │ │ │ return forward_call(*args, **kwargs) │
│ 1512 │ │ # Do not call functions when jit is used │
│ 1513 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1514 │ │ backward_pre_hooks = [] │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/transformers/models/llama/modeling_llama.py:532 in │
│ forward │
│ │
│ 529 │ │ │ position_ids = position_ids.view(-1, seq_length).long() │
│ 530 │ │ │
│ 531 │ │ if inputs_embeds is None: │
│ ❱ 532 │ │ │ inputs_embeds = self.embed_tokens(input_ids) │
│ 533 │ │ # embed positions │
│ 534 │ │ if attention_mask is None: │
│ 535 │ │ │ attention_mask = torch.ones( │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1502 in _wrapped_call_impl │
│ │
│ 1499 │ │ if self._compiled_call_impl is not None: │
│ 1500 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: │
│ 1501 │ │ else: │
│ ❱ 1502 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1503 │ │
│ 1504 │ def _call_impl(self, *args, **kwargs): │
│ 1505 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_s │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/module.py:1511 in _call_impl │
│ │
│ 1508 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1509 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1510 │ │ │ │ or global_forward_hooks or global_forward_pre_hooks │
│ ❱ 1511 │ │ │ return forward_call(*args, **kwargs) │
│ 1512 │ │ # Do not call functions when jit is used │
│ 1513 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1514 │ │ backward_pre_hooks = [] │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module.hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/modules/sparse.py:162 in forward │
│ │
│ 159 │ │ │ │ self.weight[self.padding_idx].fill(0) │
│ 160 │ │
│ 161 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 162 │ │ return F.embedding( │
│ 163 │ │ │ input, self.weight, self.padding_idx, self.max_norm, │
│ 164 │ │ │ self.norm_type, self.scale_grad_by_freq, self.sparse) │
│ 165 │
│ │
│ /home/students/kulcsar/anaconda3/envs/software_bubble_updated_pytorch/lib/py │
│ thon3.9/site-packages/torch/nn/functional.py:2238 in embedding │
│ │
│ 2235 │ │ # torch.embedding_renorm │
│ 2236 │ │ # remove once script supports set_grad_enabled │
│ 2237 │ │ no_grad_embedding_renorm(weight, input, max_norm, norm_type │
│ ❱ 2238 │ return torch.embedding(weight, input, padding_idx, scale_grad_by │
│ 2239 │
│ 2240 │
│ 2241 def embedding_bag( │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: device-side assert triggered
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Does anyone have an idea, what might be the issue? Any help would be greatly appreciated!
The text was updated successfully, but these errors were encountered: