Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate costs #16

Open
cbfrance opened this issue Sep 6, 2023 · 2 comments
Open

Estimate costs #16

cbfrance opened this issue Sep 6, 2023 · 2 comments

Comments

@cbfrance
Copy link
Contributor

cbfrance commented Sep 6, 2023

How much will this system cost in terms of compute overhead and APIs?

Context: personally I think we probably want the best embeddings we can get, even at very high cost — at least I am happy to throw money to achieve a few percentage points of better quality. But then, for example, we will probably want to regenerate embeddings regularly and throw a lot of those expensive API calls away. So, iterating with a RAG system could add up. I have never built a system like this before that might depend so heavily on external models.

Eventually we will do fine-tuning and potentially even model training, but for now I'm just trying to get my head around the costs of a well-prepared RAG system.

  • For inference / runtime costs, if we had x users and x queries per session, how does it pencil out?

  • For training, how many round-trips requests do we need to get our metadata refined, split, summarize, vectorize, tag, etc. to arrive at the system that is ready for inference? I assume we will use OpenAI to generate embeddings, and there will be pre-processing steps needed to get quality embeddings.

  • For other compute costs, hosting and indexes etc, we probably need a spreadsheet with all of our SaaS tools, APIs and costs.

@cbfrance
Copy link
Contributor Author

cbfrance commented Sep 6, 2023

Gut check: Doing napkin math it could be easily $3k/month in OpenAI costs alone, maybe as high as $10k/month? Does that sound right? It's unclear to me if we could even get a rate limit that high.

@cbfrance
Copy link
Contributor Author

cbfrance commented Sep 6, 2023

To clarify I am not shy about costs — I am happy to pay premium for quality and speed of iteration in the first 6 weeks especially. Later we can talk about reducing cost of operation over time eg. by swapping in cheaper endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant