Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource (CPU,RAM,GPU,thread count) monitoring to AutoML experiments #6320

Closed
andrasfuchs opened this issue Sep 12, 2022 · 2 comments · Fixed by #6520
Closed

Add resource (CPU,RAM,GPU,thread count) monitoring to AutoML experiments #6320

andrasfuchs opened this issue Sep 12, 2022 · 2 comments · Fixed by #6520
Labels
enhancement New feature or request
Milestone

Comments

@andrasfuchs
Copy link
Contributor

Is your feature request related to a problem? Please describe.
As others also experienced, AutoML training is heavy on CPU and RAM and it can cause slowdowns and crashes (#6175, #6286, #6288, #6297). I sometimes run into an issue where some of my trials run longer than expected, potentially because my systems ran out of one of my resources. I had a few system crashes as well, when running AutoML forced Windows to start closing other applications.

Describe the solution you'd like
It would be great to have more information about the running AutoML trials, including how much CPU, RAM, GPU are using on how many threads. Ideally it would be included in a new, periodically called method on AutoML's IMonitor interface.
If this was combined with an extended experiment control (#5736), we could make clever decisions about a trial or experiment depending on its resource usage. We could pause the experiment if the system is out of resources, or even cancel a trial if it uses suspiciously high amount of RAM to prevent system failure, for example. (As it happens sometimes with my experiments.)

Describe alternatives you've considered
Well, theoretically I could monitor my system resources constantly on a separate thread, but I still couldn't determine if AutoML is the reason for an elevated CPU, RAM or GPU usage, or something else running on the system independently from AutoML.

Additional context
This issue is related to AutoML experiment resource usage limiting (#6061) and AutoML experiment control (#5736).

@andrasfuchs andrasfuchs added the enhancement New feature or request label Sep 12, 2022
@ghost ghost added the untriaged New issue has not been triaged label Sep 12, 2022
@LittleLittleCloud
Copy link
Contributor

#6293

Also FYI we just add monitoring of CPU and memory usage in #6305. Monitoring GPU usage is not currently on roadmap since most of ML trainer doesn't run on GPU

@dakersnar dakersnar removed the untriaged New issue has not been triaged label Sep 12, 2022
@dakersnar dakersnar added this to the ML.NET Future milestone Sep 12, 2022
@andrasfuchs
Copy link
Contributor Author

Excellent, thank you @LittleLittleCloud!!

@ghost ghost locked as resolved and limited conversation to collaborators Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
3 participants