Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[alert-manager] add gpu*hours info in cluster-utilization cronjob #5294

Merged
merged 12 commits into from
Feb 22, 2021

Conversation

suiguoxin
Copy link
Member

No description provided.

@suiguoxin suiguoxin requested a review from Binyang2014 February 8, 2021 06:49
@coveralls
Copy link

coveralls commented Feb 8, 2021

Coverage Status

Coverage increased (+0.01%) to 34.252% when pulling a18d54a on suiguoxin:gpu-usage into f652c99 on microsoft:master.

@suiguoxin suiguoxin mentioned this pull request Feb 8, 2021
55 tasks
@Binyang2014
Copy link
Contributor

The Resources Occupied seems not correct for running jobs. For job doesn't use gang allocation, if 3 tasks is running and 3 tasks is waiting, and each task using 1 GPU. The Resources Occupied will be 6GPU * duration according to current code. But it is not correct.

@suiguoxin
Copy link
Member Author

The Resources Occupied seems not correct for running jobs. For job doesn't use gang allocation, if 3 tasks is running and 3 tasks is waiting, and each task using 1 GPU. The Resources Occupied will be 6GPU * duration according to current code. But it is not correct.

Fixed, get resources occupied from Prometheus instead.

@suiguoxin suiguoxin merged commit 0bce39e into microsoft:master Feb 22, 2021
@suiguoxin suiguoxin deleted the gpu-usage branch February 22, 2021 07:27
@suiguoxin suiguoxin mentioned this pull request Mar 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants