-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PETALS to run big models on any device. #3221
Comments
WEB UI is all about running all this stuff locally by using your own computer power. Petals as far as I heard it's like a Torrent or more like a Bitcoin farming where PCs from around the world being used all together to run AI model. So I personally don't see it as a good idea at all. |
You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments. |
Petals allows you to create a private swarm as well, like in a lab environment with a bunch of smaller GPUs. Would be worth adding it to the Web UI if possible. I would love to contribute if only I knew how. |
I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step. |
It's just an option though, for people who want to run larger models with a faster inference speed. Doesn't need to be the default. I'm currently experimenting with setting up my own private swarm and I can run inference through it using Python code. Just the inference part could be integrated into webUI to allow access to the interface maybe. Not sure. |
I don't think the developer will just decide to take and cut out the functionality of running llm locally, especially as far as I understand in the context of petals it is possible to delegate only the launch of the basic large model, and large in this context is not 7 and 13 billion parameters, but 65 and 70. And in this context to run on top of these models their adapters through a convenient web ui sounds more than fine, because most of the users can run models that will fit into 15 gigs of VRAM (and we all know why it is 15😂). Due to the fact that obviously no one will force you to run the model through petals, I think that you are just being conservative, and partly I understand you and your conservatism has reasons, but I still think that you should not give up such an innovation. |
Oobabooga kinda facing alot of bug issues right now that still needs to be fixed first and most of these issues because new code is being added all the time and more issues as a result. As long as it's optional extension to the WEB UI I don't mind it being implemented but with so much new futures WEB UI became practically unusable for me. Using an older build of WEB UI right now becasue of the bugs. |
Ok, so I read out this issue, and what I want to tell that, i mean it may be cringe to make it as extension but more like a another backend like ExLLaMA, GPTQ and etc... |
And the PETALS is not an API that you don't run locally, but your computer also runs the model but the small part that can fit in GPU. The new petals backend maybe will add option to run any big model without any trouble also, you save space on your disk and memory so it is good. Like is local but partially online. |
For example why we run will it on one not "local" instance of cloud GPU if we can use petals for free, it just will make it more available than ever. |
But yes, we need to wait for fixing bugs. |
Also PETALS can run for example in theory a small 13B or 30B model that you can run locally but without quantization, but the PETALS community want big models instead. Because the quantization can also lose perfomance but not so much. But it is possible. |
Also if you are looking for more perfomance model (70b) than imitation model (7b,13b) with easy and free entrance it is PETALS. |
Would like to see Petals support too! |
Поддерживаю! Пожалуйста, добавьте Petals 🙏 |
I am writing it as an extension. I guess I will publish an alpha version in a week or two. |
I did a PR: #3784 |
Haha great! So I'll have a look instead. |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
This would most certainly be a great addition as an alternate backend. You can run larger models on private swarm. |
Description
Check petals repository, library and docs to integrate it into text generation webui for torrent style inference of LLMs.
Also is 10x faster than offloading, saves memory (VRAM,RAM and drive memory).
Additional Context
PETALS GITHUB
The text was updated successfully, but these errors were encountered: