-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add OpenAI Rate limiting #1805
Conversation
Looks like this is adding rate limiting to both the OpenAI model and the Bedrock model. Why not add to all models? |
@axiomofjoy I checked the other models and we were not catching any kind of rate limiting error in their implementations so I think it's out of scope to try and add that functionality in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am wondering if long-term, the retry behavior would more naturally belong on the executor and handle via a priority queue. An issue for another day.
* Implement adaptive rate limiter for OpenAI * Add adaptive rate limiter to Bedrock model * Use a sensible default maximum request rate * Ruff 🐶 * Mark test as xfail after llama_index update * Do not retry on rate limit errors with tenacity * Remove xfail after llama_index version lock * Use events and locks instead of nesting asyncio.run * Ensure that events are always set after rate limit handling * Retry on httpx ReadTimeout errors * Update rate limiters with verbose generation info * Improve end of queue handling in AsyncExecutor * improve types to remove the need for casts (#1817) * Improve interrupt handling * Exit early from queue.join on termination events * Properly cancel running tasks * Add pytest-asyncio to hatch env * Do not await cancelled tasks * Improve task_done marking logic * Increase default concurrency --------- Co-authored-by: Xander Song <axiomofjoy@gmail.com>
resolves #1663
Implements an adaptive rate limiter that will gradually increase the rate of submission until a rate limit error is encountered, it will then lower the rate limit and block until the rate limited request can be completed.