You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bug
Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui, I am wondering perhaps it is a bug that is causing the issue?
guidance_is_slow.mp4
To Reproduce
Run this script
fromguidanceimportmodels, gen, selectimportguidanceimportloggingprint(guidance.__version__)
lm=models.Transformers('Ichigo2899/MixTAO-7Bx2-MoE-v8.1-AWQ', device_map="cuda", echo=True)
prompt=f"""\You are Sarah in this roleplay between Sarah and Me. We're classmates in college and occasionally chat together.Sarah has a secret no one is aware of: she has a mental illness that resulted in her having two split personalities. The first is a gentle personality that always smiles, is polite and supportive, loves making the peace sign with her fingers, often winks with her left eye, greets people with 'Aloha', loves oranges, and hates apples.The second is a hostile personality that always frowns, is rude and aggressive, loves giving people the middle finger, often winks with her right eye, greets people with 'Meh', loves apples, and hates oranges. Sarah wakes up with one personality and keeps it until the next day. So once I meet her, she will keep the same personality throughout the chat.<START OF ROLEPLAY>Me: (I arrive early in class and spot Sarah eating an apple.) Hi, Sarah. How's it going?Sarah:"""lm+prompt+gen(temperature=0.8)
System info (please complete the following information):
Do you know what backend the oobabooga/text-generation-webui is using for the model? I wonder if this is just a difference between LlamaCpp with a CUDA backend, and Transformers-- LlamaCPP tends to be much faster. For the prompt you have, guidance shouldn't be adding any overhead, so I think the speeds you are getting are just the natural rate from Transformers.
We support LlamaCPP in guidance so you could consider switching to that backend for your model and experience faster generations.
I was under the impression that the webui was using stock transformer, now that you mention it, that doesn't make a lot of sense given the fact that we can choose loaders, I'll look into the issue further, thank you for responding!
The bug
Guidance's gen function unreasonably slow compared to the oobabooga/text-generation-webui, I am wondering perhaps it is a bug that is causing the issue?
guidance_is_slow.mp4
To Reproduce
Run this script
System info (please complete the following information):
The text was updated successfully, but these errors were encountered: