Tweak dashboard retries functionality for 401/403 errors #22652
Labels
area/dashboard
kind/enhancement
A feature request - must adhere to the feature request template.
severity/P1
Has a major impact to usage or development of the system.
Is your enhancement related to a problem? Please describe
Dashboard retries feature has been implemented to retry requests that result in 401/403 errors in the dashboard:
Here are the results of this fix in the dogfooding cluster. To summarize, out of 118 failing requests, ~94% of those requests have succeeded thanks to retries.
My dashboard refresh script was able to reproduce the error for a few hours on Oct 31:
According to the metrics, out of about 2849 refreshes to the dashboard (maybe this is too much, I reduced the refresh frequency to 60 times every 10 mins) there were about 118 requests that required a retry. Note that the sample size (118) is quite small because the 401/403 issues are rare on the dogfooding cluster.
Out of 118 requests, 111 of them (~94%) succeeded thanks to the retries:
Out of 118 requests, 7 of them (~6%) failed despite the retries:
(**) retries greater than 3 is unique to the
provision
request. Theprovision
request is currently being retried up to 7 times because the provision request is also being retried in testBackends()Describe the solution you'd like
The 94% success rate for requests that would have otherwise all failed is nice. Some ideas to further tweak the retries to improve the percentage could be:
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: