You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As others also experienced, AutoML training is heavy on CPU and RAM and it can cause slowdowns and crashes (#6175, #6286, #6288, #6297). I sometimes run into an issue where some of my trials run longer than expected, potentially because my systems ran out of one of my resources. I had a few system crashes as well, when running AutoML forced Windows to start closing other applications.
Describe the solution you'd like
It would be great to have more information about the running AutoML trials, including how much CPU, RAM, GPU are using on how many threads. Ideally it would be included in a new, periodically called method on AutoML's IMonitor interface.
If this was combined with an extended experiment control (#5736), we could make clever decisions about a trial or experiment depending on its resource usage. We could pause the experiment if the system is out of resources, or even cancel a trial if it uses suspiciously high amount of RAM to prevent system failure, for example. (As it happens sometimes with my experiments.)
Describe alternatives you've considered
Well, theoretically I could monitor my system resources constantly on a separate thread, but I still couldn't determine if AutoML is the reason for an elevated CPU, RAM or GPU usage, or something else running on the system independently from AutoML.
Additional context
This issue is related to AutoML experiment resource usage limiting (#6061) and AutoML experiment control (#5736).
The text was updated successfully, but these errors were encountered:
Also FYI we just add monitoring of CPU and memory usage in #6305. Monitoring GPU usage is not currently on roadmap since most of ML trainer doesn't run on GPU
Is your feature request related to a problem? Please describe.
As others also experienced, AutoML training is heavy on CPU and RAM and it can cause slowdowns and crashes (#6175, #6286, #6288, #6297). I sometimes run into an issue where some of my trials run longer than expected, potentially because my systems ran out of one of my resources. I had a few system crashes as well, when running AutoML forced Windows to start closing other applications.
Describe the solution you'd like
It would be great to have more information about the running AutoML trials, including how much CPU, RAM, GPU are using on how many threads. Ideally it would be included in a new, periodically called method on AutoML's IMonitor interface.
If this was combined with an extended experiment control (#5736), we could make clever decisions about a trial or experiment depending on its resource usage. We could pause the experiment if the system is out of resources, or even cancel a trial if it uses suspiciously high amount of RAM to prevent system failure, for example. (As it happens sometimes with my experiments.)
Describe alternatives you've considered
Well, theoretically I could monitor my system resources constantly on a separate thread, but I still couldn't determine if AutoML is the reason for an elevated CPU, RAM or GPU usage, or something else running on the system independently from AutoML.
Additional context
This issue is related to AutoML experiment resource usage limiting (#6061) and AutoML experiment control (#5736).
The text was updated successfully, but these errors were encountered: