You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm getting an error when attempting to use generate_simple inside of a Gradio UI. I can run test_inference.py just fine, however when I put that code into a Gradio UI and attempt to redirect the output to a Chatbot component, I get the below error:
Traceback (most recent call last):
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/gradio/routes.py", line 422, in run_predict
output = await app.get_blocks().process_api(
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/gradio/blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/mmealman/src/exllama/webui/Chatbot.py", line 72, in bot
bot_message = self.predict(history, user_message)
File "/home/mmealman/src/exllama/webui/Chatbot.py", line 58, in predict
return self.textgen.test_generate()
File "/home/mmealman/src/exllama/TextGenerator.py", line 96, in test_generate
text = generator.generate_simple(prompt, max_new_tokens = gen_tokens)
File "/home/mmealman/src/exllama/generator.py", line 176, in generate_simple
self.gen_begin(ids)
File "/home/mmealman/src/exllama/generator.py", line 103, in gen_begin
self.model.forward(self.sequence[:, :-1], self.cache, preprocess_only = True)
File "/home/mmealman/src/exllama/model.py", line 1153, in forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device])
File "/home/mmealman/src/exllama/model.py", line 540, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer)
File "/home/mmealman/src/exllama/model.py", line 447, in forward
query_states = self.q_proj.forward(hidden_states)
File "/home/mmealman/src/exllama/model.py", line 314, in forward
out = cuda_ext.ExAutogradMatmul4bitCuda.apply(x, self.qweight, self.scales, self.qzeros, self.groupsize, self.bits, self.maxq)
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/mmealman/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/mmealman/src/exllama/cuda_ext.py", line 271, in forward
raise ValueError("Not implemented yet")
ValueError: Not implemented yet
Below is the generation code I'm calling in the Chatbot:
deftest_generate(self):
tokenizer_model_path="/home/mmealman/src/models/vicuna-13B-1.1-GPTQ-4bit-128g/tokenizer.model"model_config_path="/home/mmealman/src/models/vicuna-13B-1.1-GPTQ-4bit-128g/config.json"model_path="/home/mmealman/src/models/vicuna-13B-1.1-GPTQ-4bit-128g/vicuna-13B-1.1-GPTQ-4bit-128g.safetensors"config=ExLlamaConfig(model_config_path)
config.model_path=model_pathconfig.max_seq_len=2048model=ExLlama(config)
cache=ExLlamaCache(model)
tokenizer=ExLlamaTokenizer(tokenizer_model_path)
generator=ExLlamaGenerator(model, tokenizer, cache)
generator.settings.token_repetition_penalty_max=1.2generator.settings.token_repetition_penalty_sustain=20generator.settings.token_repetition_penalty_decay=50prompt= \
"On 19 February 1952, Headlam became senior air staff officer (SASO) at Eastern Area Command in Penrith, New South " \
"Wales. During his term as SASO, the RAAF began re-equipping with English Electric Canberra jet bombers and CAC " \
"Sabre jet fighters. The Air Force also underwent a major organisational change, as it transitioned from a " \
"geographically based command-and-control system to one based on function, resulting in the establishment of Home " \
"(operational), Training, and Maintenance Commands. Eastern Area Command, considered a de facto operational " \
"headquarters owing to the preponderance of combat units under its control, was reorganised as Home Command in " \
"October 1953. Headlam was appointed an Officer of the Order of the British Empire (OBE) in the 1954 New Year " \
"Honours for his \"exceptional ability and devotion to duty\". He was promoted to acting air commodore in May. His " \
"appointment as aide-de-camp to Queen Elizabeth II was announced on 7 October 1954."gen_tokens=200text=generator.generate_simple(prompt, max_new_tokens=gen_tokens)
returntext
ExLLaMA generation in all other stand alone Python scripts works fine. The Gradio UI code also has worked fine in several other projects.
The text was updated successfully, but these errors were encountered:
The only place it throws that exception is in the quantized autograd matmul function, after testing torch.is_grad_enabled() == True. So I would assume that you're running without torch.no_grad(), which is currently required since it doesn't support back propagation yet.
And it might never, actually, since it requires a rewrite (or alternative version) of all the CUDA functions, and it's not clear that it would perform any better than the Transformers/GPTQ implementation anyway, when training models.
I'm getting an error when attempting to use generate_simple inside of a Gradio UI. I can run test_inference.py just fine, however when I put that code into a Gradio UI and attempt to redirect the output to a Chatbot component, I get the below error:
Below is the generation code I'm calling in the Chatbot:
ExLLaMA generation in all other stand alone Python scripts works fine. The Gradio UI code also has worked fine in several other projects.
The text was updated successfully, but these errors were encountered: