Add support for PETALS to run big models on any device. #3221

tonymacx86PRO · 2023-07-20T15:44:15Z

Description

Check petals repository, library and docs to integrate it into text generation webui for torrent style inference of LLMs.
Also is 10x faster than offloading, saves memory (VRAM,RAM and drive memory).

Additional Context
PETALS GITHUB

Flanua · 2023-07-24T05:48:27Z

WEB UI is all about running all this stuff locally by using your own computer power. Petals as far as I heard it's like a Torrent or more like a Bitcoin farming where PCs from around the world being used all together to run AI model. So I personally don't see it as a good idea at all.

GordeyTsy · 2023-07-24T23:49:34Z

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

GauravB159 · 2023-07-25T00:18:08Z

WEB UI is all about running all this stuff locally by using your own computer power. Petals as far as I heard it's like a Torrent or more like a Bitcoin farming where PCs from around the world being used all together to run AI model. So I personally don't see it as a good idea at all.

Petals allows you to create a private swarm as well, like in a lab environment with a bunch of smaller GPUs. Would be worth adding it to the Web UI if possible. I would love to contribute if only I knew how.

Flanua · 2023-07-25T00:25:35Z

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step.

GauravB159 · 2023-07-25T00:40:14Z

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step.

It's just an option though, for people who want to run larger models with a faster inference speed. Doesn't need to be the default. I'm currently experimenting with setting up my own private swarm and I can run inference through it using Python code. Just the inference part could be integrated into webUI to allow access to the interface maybe. Not sure.

GordeyTsy · 2023-07-25T00:44:21Z

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step.

I don't think the developer will just decide to take and cut out the functionality of running llm locally, especially as far as I understand in the context of petals it is possible to delegate only the launch of the basic large model, and large in this context is not 7 and 13 billion parameters, but 65 and 70. And in this context to run on top of these models their adapters through a convenient web ui sounds more than fine, because most of the users can run models that will fit into 15 gigs of VRAM (and we all know why it is 15😂). Due to the fact that obviously no one will force you to run the model through petals, I think that you are just being conservative, and partly I understand you and your conservatism has reasons, but I still think that you should not give up such an innovation.

Flanua · 2023-07-25T00:51:50Z

Oobabooga kinda facing alot of bug issues right now that still needs to be fixed first and most of these issues because new code is being added all the time and more issues as a result. As long as it's optional extension to the WEB UI I don't mind it being implemented but with so much new futures WEB UI became practically unusable for me. Using an older build of WEB UI right now becasue of the bugs.

tonymacx86PRO · 2023-07-28T15:20:18Z

Ok, so I read out this issue, and what I want to tell that, i mean it may be cringe to make it as extension but more like a another backend like ExLLaMA, GPTQ and etc...

tonymacx86PRO · 2023-07-28T15:22:12Z

And the PETALS is not an API that you don't run locally, but your computer also runs the model but the small part that can fit in GPU. The new petals backend maybe will add option to run any big model without any trouble also, you save space on your disk and memory so it is good. Like is local but partially online.

tonymacx86PRO · 2023-07-28T15:25:21Z

For example why we run will it on one not "local" instance of cloud GPU if we can use petals for free, it just will make it more available than ever.

tonymacx86PRO · 2023-07-28T15:25:51Z

But yes, we need to wait for fixing bugs.

tonymacx86PRO · 2023-07-28T15:27:13Z

Also PETALS can run for example in theory a small 13B or 30B model that you can run locally but without quantization, but the PETALS community want big models instead. Because the quantization can also lose perfomance but not so much. But it is possible.

tonymacx86PRO · 2023-07-28T15:48:00Z

Also if you are looking for more perfomance model (70b) than imitation model (7b,13b) with easy and free entrance it is PETALS.

artemcrum · 2023-07-28T17:55:12Z

Would like to see Petals support too!

denser-ru · 2023-08-27T23:14:19Z

Поддерживаю! Пожалуйста, добавьте Petals 🙏

shohamjac · 2023-08-31T08:36:56Z

I am writing it as an extension. I guess I will publish an alpha version in a week or two.

Mathnerd314 · 2023-09-02T01:00:13Z

I did a PR: #3784

shohamjac · 2023-09-02T19:52:46Z

Haha great! So I'll have a look instead.

github-actions · 2023-10-14T23:16:16Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

gaborkukucska · 2023-10-18T16:23:29Z

This would most certainly be a great addition as an alternate backend. You can run larger models on private swarm.

tonymacx86PRO added the enhancement New feature or request label Jul 20, 2023

github-actions bot added the stale label Oct 14, 2023

github-actions bot closed this as completed Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PETALS to run big models on any device. #3221

Add support for PETALS to run big models on any device. #3221

tonymacx86PRO commented Jul 20, 2023

Flanua commented Jul 24, 2023

GordeyTsy commented Jul 24, 2023

GauravB159 commented Jul 25, 2023

Flanua commented Jul 25, 2023

GauravB159 commented Jul 25, 2023

GordeyTsy commented Jul 25, 2023

Flanua commented Jul 25, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023 •

edited

Loading

tonymacx86PRO commented Jul 28, 2023

artemcrum commented Jul 28, 2023

denser-ru commented Aug 27, 2023

shohamjac commented Aug 31, 2023

Mathnerd314 commented Sep 2, 2023

shohamjac commented Sep 2, 2023

github-actions bot commented Oct 14, 2023

gaborkukucska commented Oct 18, 2023

Add support for PETALS to run big models on any device. #3221

Add support for PETALS to run big models on any device. #3221

Comments

tonymacx86PRO commented Jul 20, 2023

Flanua commented Jul 24, 2023

GordeyTsy commented Jul 24, 2023

GauravB159 commented Jul 25, 2023

Flanua commented Jul 25, 2023

GauravB159 commented Jul 25, 2023

GordeyTsy commented Jul 25, 2023

Flanua commented Jul 25, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023

tonymacx86PRO commented Jul 28, 2023 • edited Loading

tonymacx86PRO commented Jul 28, 2023

artemcrum commented Jul 28, 2023

denser-ru commented Aug 27, 2023

shohamjac commented Aug 31, 2023

Mathnerd314 commented Sep 2, 2023

shohamjac commented Sep 2, 2023

github-actions bot commented Oct 14, 2023

gaborkukucska commented Oct 18, 2023

tonymacx86PRO commented Jul 28, 2023 •

edited

Loading