Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered #273

Open
Thireus opened this issue Sep 4, 2023 · 1 comment

Comments

@Thireus
Copy link

Thireus commented Sep 4, 2023

The following is sometimes happening while completion is ongoing for large context sizes.

  • My context size was: 3,262
  • The max_new_tokens set: 4,096
Traceback (most recent call last):
  File "/home/username/text-generation-webui/modules/callbacks.py", line 56, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "/home/username/text-generation-webui/modules/text_generation.py", line 321, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1642, in generate
    return self.sample(
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2724, in sample
    outputs = self(
  File "/home/username/text-generation-webui/modules/exllama_hf.py", line 87, in __call__
    logits = self.ex_model.forward(torch.tensor([seq[-1:]], dtype=torch.long), ex_cache, lora=self.lora).to(input_ids.device)
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 967, in forward
    r = self._forward(input_ids[:, chunk_begin : chunk_end],
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 1053, in _forward
    hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 530, in forward
    self.self_attn.fused(hidden_states, cache, buffer, self.input_layernorm, lora)
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 376, in fused
    key_states = cache.key_states[self.index].narrow(2, 0, past_len + q_len).narrow(0, 0, bsz)
RuntimeError: start (0) + length (4097) exceeds dimension size (4096).
Exception in thread Thread-172 (gentask):
Traceback (most recent call last):
  File "/home/username/miniconda3/envs/textgen/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
  File "/home/username/text-generation-webui/modules/text_generation.py", line 328, in generate_reply_HF
    yield get_reply_from_output_ids(output, input_ids, original_question, state, is_chat=is_chat)
  File "/home/username/text-generation-webui/modules/text_generation.py", line 206, in get_reply_from_output_ids
    if shared.tokenizer.convert_ids_to_tokens(int(output_ids[-new_tokens])).startswith('▁'):
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@turboderp
Copy link
Owner

According to the error message, it's attempting to generate at position 4097, so it's exceeding the sequence length you've set. I have to assume this is an issue in text-generation-webui. (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants