You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How much will this system cost in terms of compute overhead and APIs?
Context: personally I think we probably want the best embeddings we can get, even at very high cost — at least I am happy to throw money to achieve a few percentage points of better quality. But then, for example, we will probably want to regenerate embeddings regularly and throw a lot of those expensive API calls away. So, iterating with a RAG system could add up. I have never built a system like this before that might depend so heavily on external models.
Eventually we will do fine-tuning and potentially even model training, but for now I'm just trying to get my head around the costs of a well-prepared RAG system.
For inference / runtime costs, if we had x users and x queries per session, how does it pencil out?
For training, how many round-trips requests do we need to get our metadata refined, split, summarize, vectorize, tag, etc. to arrive at the system that is ready for inference? I assume we will use OpenAI to generate embeddings, and there will be pre-processing steps needed to get quality embeddings.
For other compute costs, hosting and indexes etc, we probably need a spreadsheet with all of our SaaS tools, APIs and costs.
The text was updated successfully, but these errors were encountered:
Gut check: Doing napkin math it could be easily $3k/month in OpenAI costs alone, maybe as high as $10k/month? Does that sound right? It's unclear to me if we could even get a rate limit that high.
To clarify I am not shy about costs — I am happy to pay premium for quality and speed of iteration in the first 6 weeks especially. Later we can talk about reducing cost of operation over time eg. by swapping in cheaper endpoints.
How much will this system cost in terms of compute overhead and APIs?
Context: personally I think we probably want the best embeddings we can get, even at very high cost — at least I am happy to throw money to achieve a few percentage points of better quality. But then, for example, we will probably want to regenerate embeddings regularly and throw a lot of those expensive API calls away. So, iterating with a RAG system could add up. I have never built a system like this before that might depend so heavily on external models.
Eventually we will do fine-tuning and potentially even model training, but for now I'm just trying to get my head around the costs of a well-prepared RAG system.
For inference / runtime costs, if we had x users and x queries per session, how does it pencil out?
For training, how many round-trips requests do we need to get our metadata refined, split, summarize, vectorize, tag, etc. to arrive at the system that is ready for inference? I assume we will use OpenAI to generate embeddings, and there will be pre-processing steps needed to get quality embeddings.
For other compute costs, hosting and indexes etc, we probably need a spreadsheet with all of our SaaS tools, APIs and costs.
The text was updated successfully, but these errors were encountered: