Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How can I improve the concurrency performance of the ragflow_stream_output api? #3641

Open
Weishaoya opened this issue Nov 25, 2024 · 2 comments
Labels
question Further information is requested

Comments

@Weishaoya
Copy link

Describe your problem

image
image
image
For ragflow_streaming_output api, when I set the number of concurrent requests to 1, 10, and 100, the first token latency was 0.6719s, 4.7593s, and 41.9158s, respectively. Due to the existence of the retrieval link, the concurrency performance of ragflow_stream_output api is weak, which is not conducive to large-scale applications. How can I improve the concurrency performance of ragflow_stream_output api?

@Weishaoya Weishaoya added the question Further information is requested label Nov 25, 2024
@Weishaoya Weishaoya changed the title [Question]: How can I improve the concurrency performance of the ragflow stream output api? [Question]: How can I improve the concurrency performance of the ragflow_stream_output api? Nov 25, 2024
@KevinHuSh
Copy link
Collaborator

Change the run_simple function in api/ragflow_server.py.

@Weishaoya
Copy link
Author

Change the run_simple function in api/ragflow_server.py.
image
image
image
I have changed the run_simple function to the Gunicorn. The concurrency performance of ragflow_stream_output improves when I set workers to 10, but it has a problem that the embedding model will be loaded 10 times on gpu-0. Can you provide a better way to improve the concurrency performance? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants