You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following is sometimes happening while completion is ongoing for large context sizes.
My context size was: 3,262
The max_new_tokens set: 4,096
Traceback (most recent call last):
File "/home/username/text-generation-webui/modules/callbacks.py", line 56, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
File "/home/username/text-generation-webui/modules/text_generation.py", line 321, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1642, in generate
return self.sample(
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2724, in sample
outputs = self(
File "/home/username/text-generation-webui/modules/exllama_hf.py", line 87, in __call__
logits = self.ex_model.forward(torch.tensor([seq[-1:]], dtype=torch.long), ex_cache, lora=self.lora).to(input_ids.device)
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 967, in forward
r = self._forward(input_ids[:, chunk_begin : chunk_end],
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 1053, in _forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 530, in forward
self.self_attn.fused(hidden_states, cache, buffer, self.input_layernorm, lora)
File "/home/username/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 376, in fused
key_states = cache.key_states[self.index].narrow(2, 0, past_len + q_len).narrow(0, 0, bsz)
RuntimeError: start (0) + length (4097) exceeds dimension size (4096).
Exception in thread Thread-172 (gentask):
Traceback (most recent call last):
File "/home/username/miniconda3/envs/textgen/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
File "/home/username/text-generation-webui/modules/text_generation.py", line 328, in generate_reply_HF
yield get_reply_from_output_ids(output, input_ids, original_question, state, is_chat=is_chat)
File "/home/username/text-generation-webui/modules/text_generation.py", line 206, in get_reply_from_output_ids
if shared.tokenizer.convert_ids_to_tokens(int(output_ids[-new_tokens])).startswith('▁'):
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The text was updated successfully, but these errors were encountered:
According to the error message, it's attempting to generate at position 4097, so it's exceeding the sequence length you've set. I have to assume this is an issue in text-generation-webui. (?)
The following is sometimes happening while completion is ongoing for large context sizes.
The text was updated successfully, but these errors were encountered: