You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature: Serving Embedding and Reranking Models Using vLLM on Xeon and Gaudi Description:
Integrate vLLM as a serving framework to enhance the performance and scalability of embedding and reranking models. This feature involves:
Leveraging vLLM's high-throughput serving capabilities to efficiently handle embedding and reranking requests.
Integration with the ChatQnA pipeline.
Optimizing the vLLM configuration for use cases involving embeddings and reranking, ensuring lower latency and better resource utilization.
Comparing vLLM's performance against the current TEI to determine the best setup for production.
Expected Outcome:
Applied another serving framework for embedding and reranking models, expect better performance on Gaudi.
Improved throughput for embedding and reranking services.
Enhanced flexibility to switch between serving frameworks based on specific requirements.
The text was updated successfully, but these errors were encountered:
The microservice itself will be done by GenAIComps feature opea-project/GenAIComps#956. And if the embedding/reranking performance serving by vLLM is better than TGI, we will update the ChatQnA example.
Priority
P1-Stopper
OS type
Ubuntu
Hardware type
Xeon-GNR
Running nodes
Single Node
Description
Feature: Serving Embedding and Reranking Models Using vLLM on Xeon and Gaudi
Description:
Integrate vLLM as a serving framework to enhance the performance and scalability of embedding and reranking models. This feature involves:
Leveraging vLLM's high-throughput serving capabilities to efficiently handle embedding and reranking requests.
Integration with the ChatQnA pipeline.
Optimizing the vLLM configuration for use cases involving embeddings and reranking, ensuring lower latency and better resource utilization.
Comparing vLLM's performance against the current TEI to determine the best setup for production.
Expected Outcome:
Applied another serving framework for embedding and reranking models, expect better performance on Gaudi.
Improved throughput for embedding and reranking services.
Enhanced flexibility to switch between serving frameworks based on specific requirements.
The text was updated successfully, but these errors were encountered: