Just Curious - Can we use it to deploy rag-based chatbots ? #80
Shivansh12t
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
We are currently building a Retrieval-Augmented Generation (RAG)-based chatbot for a large-scale event, Saturnalia, where we expect a significant surge in user traffic on the main day. We’re exploring options to accelerate inference on CPUs, and we are considering models like Mistral 7B or LLaMA 7B for this deployment.
Given that Microsoft Bitnet can help optimize LLM performance on CPU, I wanted to ask for advice regarding the CPU specifications we should consider for this setup. Specifically, we want to know:
I'm fairly new to RAG and LLMs, and we're curious about the future scope and implications of deploying large models on CPUs. While our focus is on this particular event, we’re excited about the potential and how this technology could scale in other scenarios. Any insights on best practices, or even limitations, would be really helpful!
Thanks in advance for your guidance!
Beta Was this translation helpful? Give feedback.
All reactions