-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoupling Throttling retries logic & Throttling backoff logic. #34
Comments
I am copy-pasting my response from your other issue for anyone else who reads this issue: Hey @sushaanttb, Have a look at #21 and the discussion within. You will see some more info there about what is going on with different retry settings, and how MAX_RETRY_SECONDS effects the outputs. That PR has not been merged and it doesn't look like it will, so if you want the behaviour you suggest, I would suggest you use my fork which fixes this issue and uses a more suitable value of 60 seconds for MAX_RETRY_SECONDS. |
Hi @michaeltremeer As your PR is now merged after I raised those issues. Hence, looking at the latest code and the discussion you referred : So now with your PR, The issue of not retrying the requests at all when Should we not give that control to user to decide? Attaching screenshot for ref. as per the latest code now: Basically, introducing another argument/parameter if user wants to retry using "retry header" strategy instead of governing it too from the same "backoff" check |
Hey @sushaanttb, you're spot on that the behaviour here doesn't make sense, and if you follow the logic then the actual outcome for a throttled request on PTU-M will look something like this:
There's one other important idea to explore though. While in the real-world we are going to see spikey usage across the end-point (which is where the 429 behaviour of PTU-M can be of huge benefit), as of right now this load testing tool delivers a consistent, linear call rate. Because it is linear, if the endpoint can only process 10 RPM but you are sending 15, you will always end up in a situation where the queue of new requests starts to build up until every client has an open request. In this case, it means that all requests are now going to be retrying regularly, and with enough clients (which create a longer queue of waiting requests), almost all requests will end up hitting the 60+60 seconds timeout and failing. For this reason the whole idea of having retries in this tool is bit of a waste of time, since you either have a rate low enough that you aren't throttling and don't need to retry, or you have the load too high and you are guaranteeing an ever-increasing queue of requests. This means that the only benefit of being able to manually set
In both of the situations above, there is a 'circuit-breaker' that can shed load to a backup resource when load exceeds the resource capacity. The tool in it's current state assumes this backup resource doesn't exist, which leads to unrealistic results which should never occur in production (if your architecture is designed correctly). On your various points across these issues:
|
Thanks @michaeltremeer for the detailed comments! I basically tested this tool on a non-PTU deployment (as I do not have a PTU based deployment yet) & by purposefully also building the scenario to hit the rate limits and see the throttling numbers. I actually wanted to understand the output in a bit more detail. Now, after your comment, I again had a look on the PTU's documentations, but I couldn't find any note on PTU-M & PTU-C specifically. For ref., these are the documentation links which I referred Getting started with PTU & What is provisioned throughput?. & when selecting the model deployment type as well, I am getting following option for PTU , it only lists Hence, difficult for me to comment on those points before actually knowing about it, but still thanks for the details & it will be helpful if you can share any such reference link on PTU-M and PTU-C. However, I agree with your thought in your next comment that the retry logic doesn't seems much helpful as it's just creating an ever-increasing queue of requests to be retried (because once a 429 error is received, even if we are parking the current request to retry after some time, the other requests would still keep on happening to the service , without respecting to the notion that the service has already hit it's limit & thus they would also go through that same retry logic) Subsequently, thanks for clarifying on the reason for keeping Lastly, I also had a question about the README- |
As checked from the code, we have 2 retry logics for retrying 429 errors:
File: oairequester.py
Class: OAIRequester
Function: _call
& Currently both these two retry logics are coupled together.
This violates Separation of Concerns principle.
Also, not sure if the end user would always want that - i.e. multiple retries for the same request from two different logics.
For example, If Backoff strategy is specified by the user, then the same 429 request which was retried in previous code snippet based on RETRY_AFTER_MS_HEADER will again be retried. multiple times again!
Solution Suggestion
It would be a good idea that (just like backoff) to also parametrize the retry logic from command line arguments.
So that the end user could then accordingly adjust the behavior of retries as per their requirement.
The text was updated successfully, but these errors were encountered: