fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

gjpower · 2024-10-15T15:04:09Z

Supersedes previous MR #1795

Previous implementation creates and locks threads when acquiring llama_proxy, this can cause thread starvation on many parallel requests.
This also prevents call to await run_in_threadpool(llama.create_chat_completion, **kwargs) proceeding as all worker threads are stuck awaiting lock so no progress may be made.

This MR adapts acquiring of llama_proxy to async pattern taking advantage of asyncio mechanisms. ExitStack is replaced with AsyncExitStack and improper closing of the ExitStack is addressed

…led by finally from on_complete anyway

gjpower mentioned this pull request Oct 15, 2024

Fix: add missing exit_stack.close() to end of /v1/completions endpoint #1795

Closed

gjpower mentioned this pull request Oct 23, 2024

Change server approach to handle parallel requests #1550

Open

gjpower force-pushed the fix/server_llama_call_thread_starvation branch from 8745712 to de01a63 Compare October 31, 2024 09:34

gjpower added 4 commits November 5, 2024 10:57

fix: make use of asyncio to lock llama_proxy context

222ed7c

fix: use aclose instead of close for AsyncExitStack

ab0b783

fix: don't call exit stack close in stream iterator as it will be cal…

4da21ce

…led by finally from on_complete anyway

fix: use anyio.Lock instead of asyncio.Lock

9ec5460

gjpower force-pushed the fix/server_llama_call_thread_starvation branch from de01a63 to 9ec5460 Compare November 5, 2024 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

gjpower commented Oct 15, 2024 •

edited

Loading

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

Are you sure you want to change the base?

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

Conversation

gjpower commented Oct 15, 2024 • edited Loading

gjpower commented Oct 15, 2024 •

edited

Loading