You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For ragflow_streaming_output api, when I set the number of concurrent requests to 1, 10, and 100, the first token latency was 0.6719s, 4.7593s, and 41.9158s, respectively. Due to the existence of the retrieval link, the concurrency performance of ragflow_stream_output api is weak, which is not conducive to large-scale applications. How can I improve the concurrency performance of ragflow_stream_output api?
The text was updated successfully, but these errors were encountered:
Weishaoya
changed the title
[Question]: How can I improve the concurrency performance of the ragflow stream output api?
[Question]: How can I improve the concurrency performance of the ragflow_stream_output api?
Nov 25, 2024
Change the run_simple function in api/ragflow_server.py.
I have changed the run_simple function to the Gunicorn. The concurrency performance of ragflow_stream_output improves when I set workers to 10, but it has a problem that the embedding model will be loaded 10 times on gpu-0. Can you provide a better way to improve the concurrency performance? Thank you!
Describe your problem
For ragflow_streaming_output api, when I set the number of concurrent requests to 1, 10, and 100, the first token latency was 0.6719s, 4.7593s, and 41.9158s, respectively. Due to the existence of the retrieval link, the concurrency performance of ragflow_stream_output api is weak, which is not conducive to large-scale applications. How can I improve the concurrency performance of ragflow_stream_output api?
The text was updated successfully, but these errors were encountered: