You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When evaluating MMLU, the codebase supports vLLM inference, but the speed is slow (20 minutes for a single task). According to my experience, the normal speed is 20 minutes for all tasks.
The text was updated successfully, but these errors were encountered:
Thank you for your question!
This is a known issue. Since the current architecture implements the BaseInference class based on deepspeed and vllm in the same Python file, importing deepspeed-related dependencies causes vllm to fail to start properly. Therefore, I set distributed_executor_backend="ray" when starting vllm. This does significantly affect efficiency.
We will further modify the framework in the next version to completely decouple the two backends and fully unleash the inference speed of vllm.
Required prerequisites
Questions
When evaluating MMLU, the codebase supports vLLM inference, but the speed is slow (20 minutes for a single task). According to my experience, the normal speed is 20 minutes for all tasks.
The text was updated successfully, but these errors were encountered: