Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui. #727

MikoAL · 2024-03-31T09:28:50Z

The bug
Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui, I am wondering perhaps it is a bug that is causing the issue?

guidance_is_slow.mp4

To Reproduce
Run this script

from guidance import models, gen, select
import guidance
import logging
print(guidance.__version__)

lm = models.Transformers('Ichigo2899/MixTAO-7Bx2-MoE-v8.1-AWQ', device_map="cuda", echo=True)
prompt = f"""\
You are Sarah in this roleplay between Sarah and Me. We're classmates in college and occasionally chat together.

Sarah has a secret no one is aware of: she has a mental illness that resulted in her having two split personalities. 

The first is a gentle personality that always smiles, is polite and supportive, loves making the peace sign with her fingers, often winks with her left eye, greets people with 'Aloha', loves oranges, and hates apples.

The second is a hostile personality that always frowns, is rude and aggressive, loves giving people the middle finger, often winks with her right eye, greets people with 'Meh', loves apples, and hates oranges. 

Sarah wakes up with one personality and keeps it until the next day. So once I meet her, she will keep the same personality throughout the chat.

<START OF ROLEPLAY>

Me: (I arrive early in class and spot Sarah eating an apple.) Hi, Sarah. How's it going?

Sarah:"""

lm + prompt + gen(temperature=0.8)

System info (please complete the following information):

OS Windows 11
Guidance Version 0.1.11: Because of the same issue as KeyError when loading Mistral 7b via Transformers #713, I am using 0.1.11 instead of 0.1.13

Harsha-Nori · 2024-04-02T04:39:42Z

Hi @MikoAL,

Do you know what backend the oobabooga/text-generation-webui is using for the model? I wonder if this is just a difference between LlamaCpp with a CUDA backend, and Transformers-- LlamaCPP tends to be much faster. For the prompt you have, guidance shouldn't be adding any overhead, so I think the speeds you are getting are just the natural rate from Transformers.

We support LlamaCPP in guidance so you could consider switching to that backend for your model and experience faster generations.

MikoAL · 2024-04-02T05:00:36Z

I was under the impression that the webui was using stock transformer, now that you mention it, that doesn't make a lot of sense given the fact that we can choose loaders, I'll look into the issue further, thank you for responding!

MikoAL closed this as completed Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui. #727

Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui. #727

MikoAL commented Mar 31, 2024 •

edited

Loading

Harsha-Nori commented Apr 2, 2024

MikoAL commented Apr 2, 2024

Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui. #727

Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui. #727

Comments

MikoAL commented Mar 31, 2024 • edited Loading

Harsha-Nori commented Apr 2, 2024

MikoAL commented Apr 2, 2024

MikoAL commented Mar 31, 2024 •

edited

Loading