Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PETALS to run big models on any device. #3221

Closed
tonymacx86PRO opened this issue Jul 20, 2023 · 20 comments
Closed

Add support for PETALS to run big models on any device. #3221

tonymacx86PRO opened this issue Jul 20, 2023 · 20 comments
Labels
enhancement New feature or request stale

Comments

@tonymacx86PRO
Copy link

Description

Check petals repository, library and docs to integrate it into text generation webui for torrent style inference of LLMs.
Also is 10x faster than offloading, saves memory (VRAM,RAM and drive memory).

Additional Context
PETALS GITHUB

@tonymacx86PRO tonymacx86PRO added the enhancement New feature or request label Jul 20, 2023
@Flanua
Copy link

Flanua commented Jul 24, 2023

WEB UI is all about running all this stuff locally by using your own computer power. Petals as far as I heard it's like a Torrent or more like a Bitcoin farming where PCs from around the world being used all together to run AI model. So I personally don't see it as a good idea at all.

@GordeyTsy
Copy link

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

@GauravB159
Copy link
Contributor

WEB UI is all about running all this stuff locally by using your own computer power. Petals as far as I heard it's like a Torrent or more like a Bitcoin farming where PCs from around the world being used all together to run AI model. So I personally don't see it as a good idea at all.

Petals allows you to create a private swarm as well, like in a lab environment with a bunch of smaller GPUs. Would be worth adding it to the Web UI if possible. I would love to contribute if only I knew how.

@Flanua
Copy link

Flanua commented Jul 25, 2023

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step.

@GauravB159
Copy link
Contributor

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step.

It's just an option though, for people who want to run larger models with a faster inference speed. Doesn't need to be the default. I'm currently experimenting with setting up my own private swarm and I can run inference through it using Python code. Just the inference part could be integrated into webUI to allow access to the interface maybe. Not sure.

@GordeyTsy
Copy link

You are wrong, because petals only delegates the launching of basic scales of the base model, which physically cannot be launched by 95% of users, and also petals gives you the possibility to launch adapters on your hardware and train them. The structure of interaction with petals is very similar to the launching of regular models. Interaction and training via WEB UI would be an extremely convenient solution. It works because users can't change the base model, you should be familiar with how petals work before making such judgments.

I have a general idea how it works it uses internet connection and because of that I simply don't see it as a good idea personally unless it's implemented as an optional extension. I'm scared of ideas that will make WEB UI more dependent on the internet connection in the near future and this idea of yours is the first step.

I don't think the developer will just decide to take and cut out the functionality of running llm locally, especially as far as I understand in the context of petals it is possible to delegate only the launch of the basic large model, and large in this context is not 7 and 13 billion parameters, but 65 and 70. And in this context to run on top of these models their adapters through a convenient web ui sounds more than fine, because most of the users can run models that will fit into 15 gigs of VRAM (and we all know why it is 15😂). Due to the fact that obviously no one will force you to run the model through petals, I think that you are just being conservative, and partly I understand you and your conservatism has reasons, but I still think that you should not give up such an innovation.

@Flanua
Copy link

Flanua commented Jul 25, 2023

Oobabooga kinda facing alot of bug issues right now that still needs to be fixed first and most of these issues because new code is being added all the time and more issues as a result. As long as it's optional extension to the WEB UI I don't mind it being implemented but with so much new futures WEB UI became practically unusable for me. Using an older build of WEB UI right now becasue of the bugs.

@tonymacx86PRO
Copy link
Author

Ok, so I read out this issue, and what I want to tell that, i mean it may be cringe to make it as extension but more like a another backend like ExLLaMA, GPTQ and etc...

@tonymacx86PRO
Copy link
Author

And the PETALS is not an API that you don't run locally, but your computer also runs the model but the small part that can fit in GPU. The new petals backend maybe will add option to run any big model without any trouble also, you save space on your disk and memory so it is good. Like is local but partially online.

@tonymacx86PRO
Copy link
Author

For example why we run will it on one not "local" instance of cloud GPU if we can use petals for free, it just will make it more available than ever.

@tonymacx86PRO
Copy link
Author

But yes, we need to wait for fixing bugs.

@tonymacx86PRO
Copy link
Author

tonymacx86PRO commented Jul 28, 2023

Also PETALS can run for example in theory a small 13B or 30B model that you can run locally but without quantization, but the PETALS community want big models instead. Because the quantization can also lose perfomance but not so much. But it is possible.

@tonymacx86PRO
Copy link
Author

Also if you are looking for more perfomance model (70b) than imitation model (7b,13b) with easy and free entrance it is PETALS.

@artemcrum
Copy link

Would like to see Petals support too!

@denser-ru
Copy link

Поддерживаю! Пожалуйста, добавьте Petals 🙏

@shohamjac
Copy link

I am writing it as an extension. I guess I will publish an alpha version in a week or two.

@Mathnerd314
Copy link

I did a PR: #3784

@shohamjac
Copy link

Haha great! So I'll have a look instead.

@github-actions github-actions bot added the stale label Oct 14, 2023
@github-actions
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@gaborkukucska
Copy link

This would most certainly be a great addition as an alternate backend. You can run larger models on private swarm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

9 participants