Petals support #3784

Mathnerd314 · 2023-09-02T00:56:48Z

Checklist:

I have read the Contributing guidelines.

Introduction

Petals is a library for running models in a distributed manner, with the inference split up amongst different servers. This PR adds support for using text-generation-webui as a petals client. I have tested it in a colab notebook with and without GPU (petals doesn't support TPU currently),

Code comments

As of the most recent release, the petals API is essentially identical to huggingface's transformers, so I used the same loading codepath. One complication is that petals usually loads models from the network using the transformers download API and stores them in ~/.cache/huggingface/hub. I didn't investigate pre-downloading the model manually too deeply, I just patched the checks to allow the downloading to work. I was getting errors on trying to run webui+petals without --cpu and without a CPU in colab, so I also moved the CPU check earlier. Another note is that every http request that petals was making was getting logged, which ended up being a significant amount of output, so I suppressed that. And then you'll notice my colab notebook uses %run server.py, this makes the model downloading output much nicer, but then I noticed this warning about gradio.launch(debug=). Researching this it seems debug=true is appropriate in this case.

There is a "session" API with petals, but I investigated it and it is not currently flexible enough to support most of the webui's commands, so I just left it with the session unspecified, meaning it sets up a new inference session with the servers every time you hit "generate". It means that there is a 1-second delay or so on every request before it starts generation, sometimes much longer if it can't find sufficient servers or a server times out. Hitting the "stop" button does not interrupt the route-finding; I am not familiar enough with gradio to fix this.

shohamjac · 2023-09-03T15:53:04Z

The implementation looks great.
I thought maybe to add something like this below the loader option, so that it will be a bit more GUI based:

I can add it after the request will be merged.

Mathnerd314 · 2023-10-15T03:03:34Z

Updated for latest commit.

gaborkukucska · 2023-11-18T05:14:01Z

Would be great to get this implemented in tgwui.

oobabooga · 2023-12-04T01:55:29Z

As much as this PR is perfect, I prefer to focus this repository on experimenting with local inference.

gaborkukucska · 2023-12-08T07:02:41Z

As much as this PR is perfect, I prefer to focus this repository on experimenting with local inference.

Petals allows the creation of local swarms to be able to run LLMs on multiple GPUs. I think Petals very much worth it!!

Mathnerd314 mentioned this pull request Sep 2, 2023

Add support for PETALS to run big models on any device. #3221

Closed

borzunov mentioned this pull request Sep 19, 2023

Integration in Oobabooga webui? petals-infra/chat.petals.dev#37

Open

Mathnerd314 added 3 commits October 14, 2023 20:04

colab notebook: better error messages

f63a4f7

petals support

fe1acbc

use gpu_split parameter for petals config

50500f8

Mathnerd314 force-pushed the for-upstream branch from 136d547 to 50500f8 Compare October 15, 2023 03:02

oobabooga closed this Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Petals support #3784

Petals support #3784

Mathnerd314 commented Sep 2, 2023

shohamjac commented Sep 3, 2023

Mathnerd314 commented Oct 15, 2023

gaborkukucska commented Nov 18, 2023

oobabooga commented Dec 4, 2023

gaborkukucska commented Dec 8, 2023

Petals support #3784

Petals support #3784

Conversation

Mathnerd314 commented Sep 2, 2023

Checklist:

Introduction

Code comments

shohamjac commented Sep 3, 2023

Mathnerd314 commented Oct 15, 2023

gaborkukucska commented Nov 18, 2023

oobabooga commented Dec 4, 2023

gaborkukucska commented Dec 8, 2023