-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging - [Alerting] Azure quota usage for west us #1432
Comments
I'm going to investigate why dv4 has been so high lately. |
I was finally able to get onto the Virtual Machines scale sets for HelixStaging and it looks like the reason for this is that every instance of our android queues has 1 machine provisioned in them. Because of the quantity of queues, we are "killed with numbers". Will work with Stu on getting this resolved and figuring out how we got here. |
I agree with what Ilya found. TLDR, I believe we need to fix #1415 and remove the "unmonitored" state of the queues. The highlights are...
So we just got ourselves into a bit of a spot. I've scaled-down all queues with "android" in the name to have zero instances. This will give us back the headroom to let PRs. This might break a PR or scheduled build that tries to test on them (becuase there is nothing to cause them to scale-up again). If this happens, anyone in dnceng can simply scale-up that particular queue again. (Make a note here so we have a paper trail.) |
Closing this issue as the quota problem is fixed by the mass manual scale-down. |
💔 Metric state changed to alerting
Go to rule
@dotnet/dnceng, please investigate
Automation information below, do not change
Grafana-Automated-Alert-Id-e2be2ec3e22e46d28730bab54ff8fa77
The text was updated successfully, but these errors were encountered: