Allow cancellation of prediction while running prompt #2815

daanrongen · 2023-04-21T19:19:39Z

Open Assistant is great, but sometimes it will predict a long answer where I can spot a misinterpretation right away. Whether this is because my prompt was faulty and I realise this too late, or the model hallucinates. Either way, having to wait for the entire prediction to load significantly reduces the UX (due to waiting time). It would be useful to be able to abort the prediction.

Model OA_SFT_Llama_30B_6 
Top K 50 
Top P 0.95 
Temperature 0.9 
Repetition penalty 1.2 
Max new tokens 1024

The text was updated successfully, but these errors were encountered:

yk · 2023-04-21T20:09:44Z

Note: this requires fairly deep understanding of the inference system

alexn-s · 2023-04-23T22:39:13Z

the issue is similar to #2647. can this be closed?

0xfacade · 2023-05-21T21:26:32Z

I would enjoy working on this, as it encompasses most of the stack and would give me a chance to quickly get to know the entire application. I could put together a proposal for how to implement this over the next week, then work on the implementation the week after that.

Forbu · 2023-05-23T21:49:19Z

Is this really up to OA to do this ? it looks like OA has a big dependency to https://github.com/huggingface/text-generation-inference for the inference. If we want to have a proper way of handling cancellation, we should perhaps make (or wait ...) a contribution on this repo. I can add an issue on their side and try to contribute here.
In the openai api doc they don't have anything for cancellation, but they do have cancellation on their website.

I wonder if this is just a UI thing (they let the prompt generation run on their server).

0xfacade · 2023-05-23T22:56:11Z

I was thinking we could implement the cancellation as a stopping criterion that could be influenced from outside the model. The interface already supports stopping criteria. I haven't spent too much thought yet on that aspect of it, but it seems like a viable solution to me.

I currently see more issues in turning the communication into a bidrectional channel. Currently, the communication happens over server side events which are uni directional. I'm trying to find out whether it could be an option to close the server side communication as an indicator to stop generating.

0xfacade · 2023-05-25T21:04:26Z

As a first step, I would implement only the backend part of this feature. Goal: stop generating tokens when the event stream is closed. We can use the exception that is raised in this case to indicate that generation should be stopped - the communication is uni-directional otherwise, so we can't really send any complex messages from the client to the server.

Necessary steps:

in the oasst_inference_server that communicates directly with the UI, catch the asyncio.CancelledError that indicates that the event stream was closed by the client (see example in documentation of sse-starlette)
propagate the cancellation by closing the stream to the worker defined in basic_hf_server.py
also catch the CancelledError in basic_hf_server.py; when this happens, set a flag that indicates that the inference should be stopped
add a stopping criterion to the model that checks if the flag is set

I've added comments to the necessary places in this MR in my fork: 0xfacade#1

Potential issue: if there's a proxy in between the UI and the server(s), that proxy would have to close the connection immediately when the client closes it. I don't think that is going to be an issue, though.

Let me know what you think. @andreaskoepf

yk · 2023-05-25T21:40:37Z

@0xfacade really nice analysis, thank you very much, I think it's a totally viable plan to move forward!

0xfacade · 2023-05-29T18:35:48Z

Great, thanks. Then I'll implement this over the coming evenings. Shouldn't take long!

…arly) (#3573) In this PR, I clean up the `handle_worker()` method a bit so that I can later extend it (in a future PR). There are no functional changes in this PR. Changes: - collect the many variables in a new class `HandleWorkerContext` that also features methods for initialization and destruction - collect methods to handle updating the session in new class `SessionManager` - move management of futures into a new class `FuturesManager` - extract the logic for handling a work request and a worker response from the main loop into their own respective functions The last change is the most important one for my future changes. In the main loop of `handle_worker()`, we were already waiting for two different types of futures: newly dequeued work requests from the Redis work queue, and responses from the worker received over the websocket. I'll need to add a third type of future next that allows us to listen to requests to stop generating text (#2815). The results of the different futures used to be differentiated based on their return type, which was very hard to read. I've created a decorator in `FuturesManager` that wraps the awaitable in another awaitable that returns a tuple, where the first entry is a `FutureType` enum value, and the second value is the result of awaiting the passed in awaitable. This makes it easy to distinguish what type of result was received. I tested my changes by spinning up the inference server + worker with `docker compose`. Then I used the `text-client` to interface with the server.

…ration early) (#3573)" This reverts commit 0c2fa4c.

…ration early) (#3573)" (#3644) This PR reverts commit 0c2fa4c. We'll investigate why this caused problems

axel7083 · 2023-09-24T12:59:43Z

What is the state of this issues ? A PRs has been merged and revert ?

0xfacade · 2023-10-15T12:07:13Z

Status update:

spent a lot of time analyzing how to implement this
merged a cleanup in preparation for the implementation, which was reverted due to an unidentified bug
have been working on a new implementation since, but it is going very slow and I'm loosing motivation a bit..

0xfacade · 2023-10-15T12:08:28Z

If someone else would like to work on this, I'd be happy to share my insights and plan on how to do this.

DanyRay420 · 2024-05-25T13:47:03Z

GO

DanyRay420 · 2024-05-25T13:48:52Z

Get it

yk added the inference label Apr 21, 2023

yk added this to Open-Assistant Apr 21, 2023

github-project-automation bot moved this to 📫 Triage in Open-Assistant Apr 21, 2023

yk moved this from 📫 Triage to 🛠 Todo in Open-Assistant Apr 21, 2023

olliestanley mentioned this issue Apr 29, 2023

Add a "stop generation" button to chat. #2647

Closed

olliestanley mentioned this issue May 18, 2023

Ability to stop generating? #3193

Closed

notmd assigned 0xfacade May 22, 2023

andreaskoepf moved this from 🛠 Todo to ⚙ In Progress in Open-Assistant May 25, 2023

0xfacade mentioned this issue May 25, 2023

Implementation plan: Allow cancellation of prediction while running prompt 0xfacade/Open-Assistant#1

Open

andreaskoepf removed this from Open-Assistant Jun 5, 2023

andreaskoepf added this to Open-Assistant Jun 16, 2023

github-project-automation bot moved this to 📫 Triage in Open-Assistant Jun 16, 2023

andreaskoepf moved this from 📫 Triage to ⚙ In Progress in Open-Assistant Jun 16, 2023

0xfacade mentioned this issue Jul 15, 2023

Cleanup handle_worker() in preparation for #2815 (Stop generation early) #3573

Merged

3 tasks

yk added a commit that referenced this issue Aug 8, 2023

Revert "Cleanup handle_worker() in preparation for #2815 (Stop gene…

24006f9

…ration early) (#3573)" This reverts commit 0c2fa4c.

andreaskoepf pushed a commit that referenced this issue Aug 8, 2023

Revert "Cleanup handle_worker() in preparation for #2815 (Stop gene…

aae9e5e

…ration early) (#3573)" (#3644) This PR reverts commit 0c2fa4c. We'll investigate why this caused problems

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow cancellation of prediction while running prompt #2815

Allow cancellation of prediction while running prompt #2815

daanrongen commented Apr 21, 2023

yk commented Apr 21, 2023

alexn-s commented Apr 23, 2023

0xfacade commented May 21, 2023

Forbu commented May 23, 2023

0xfacade commented May 23, 2023

0xfacade commented May 25, 2023 •

edited

Loading

yk commented May 25, 2023

0xfacade commented May 29, 2023

axel7083 commented Sep 24, 2023

0xfacade commented Oct 15, 2023

0xfacade commented Oct 15, 2023

DanyRay420 commented May 25, 2024

DanyRay420 commented May 25, 2024

Allow cancellation of prediction while running prompt #2815

Allow cancellation of prediction while running prompt #2815

Comments

daanrongen commented Apr 21, 2023

yk commented Apr 21, 2023

alexn-s commented Apr 23, 2023

0xfacade commented May 21, 2023

Forbu commented May 23, 2023

0xfacade commented May 23, 2023

0xfacade commented May 25, 2023 • edited Loading

yk commented May 25, 2023

0xfacade commented May 29, 2023

axel7083 commented Sep 24, 2023

0xfacade commented Oct 15, 2023

0xfacade commented Oct 15, 2023

DanyRay420 commented May 25, 2024

DanyRay420 commented May 25, 2024

0xfacade commented May 25, 2023 •

edited

Loading