Handle the `RateLimitError` when AI provider has API limitations #250

jrmi · 2024-11-10T13:53:46Z

Some AI providers have rate limits on top of the model limit. These rate limits can be:

Max number of request per minutes
Max request per day
Max tokens per minutes
Max images per minutes
and probably others.

This is the case for OpenAI and Anthropic providers.

These limits highly depend on the plan the user has subscribed. The problem with these limits is that they can be lower than the actual model context limit. As consequence, depending on how you use gptme, you can easily hit the token rate limit while still far behind the model context limit especially because gptme can chain multiple requests quickly. It would be great to gracefully handle these limits even if you have a low limit.

There is already a process to truncate the messages when the context limit is exceeded but as said before the rate per minute can be relatively low on some plan (30 000 tokens for the first paid plan for instance). It would be great to use this process or something similar, for the rate limit.

I think in the with gptme the most common situations are:

to hit the max token per minute because we sent multiple requests in a raw
to hit the max token per minute because the log is bigger than the limit

jrmi · 2024-11-10T15:09:52Z

Here some thoughts on how to solve this issue in chat gptme.

Solution 1 - catch the exception and retry

We could catch the RateLimitError and when it happens we could just retry with an exponential backoff. It would solve the case when you are exceeding the rate limit because of sending requests too fast. It doesn't work when the message log is too big for one request. This the solution described by openai

Solution 2 - Track the limit with the response headers

OpenAI and Anthropic are returning the current rate limit within response headers. It seems to be the recommend way to follow the token consumption. We could keep the current limit in a shared context (may be the current model?) and check that while preparing the message to take the right decision. The two decisions are wait before sending the request or reduce the log depending on which limit is exceeded.

What do you think?

jrmi · 2024-11-10T17:30:42Z

Another quick fix could be to add a CLI option to force the max token count per request. It could help to save money as well :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle the `RateLimitError` when AI provider has API limitations #250

Handle the `RateLimitError` when AI provider has API limitations #250

jrmi commented Nov 10, 2024 •

edited

Loading

jrmi commented Nov 10, 2024

jrmi commented Nov 10, 2024

Handle the RateLimitError when AI provider has API limitations #250

Handle the RateLimitError when AI provider has API limitations #250

Comments

jrmi commented Nov 10, 2024 • edited Loading

jrmi commented Nov 10, 2024

Solution 1 - catch the exception and retry

Solution 2 - Track the limit with the response headers

jrmi commented Nov 10, 2024

Handle the `RateLimitError` when AI provider has API limitations #250

Handle the `RateLimitError` when AI provider has API limitations #250

jrmi commented Nov 10, 2024 •

edited

Loading