Implementation plan: Allow cancellation of prediction while running prompt #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implementation plan for the backend part of Allow cancellation of prediction while running prompt.
Planned changes: cancel prediction when the event source is closed in the UI.
Necessary steps:
oasst_inference_server
that communicates directly with the UI, catch theasyncio.CancelledError
that indicates that the event stream was closed by the client (see example in documentation of sse-starlette) - we will use this to indicate that the generation should be cancelledbasic_hf_server.py
CancelledError
inbasic_hf_server.py
; when this happens, set a flag that indicates that the inference should be stoppedCheck the diff in this MR for the exact lines where I would change something.