Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM - repeating "bad allocation" warning during trial #6426

Closed
Tracked by #6337
andrasfuchs opened this issue Nov 2, 2022 · 4 comments
Closed
Tracked by #6337

LightGBM - repeating "bad allocation" warning during trial #6426

andrasfuchs opened this issue Nov 2, 2022 · 4 comments
Labels
AutoML.NET Automating various steps of the machine learning process
Milestone

Comments

@andrasfuchs
Copy link
Contributor

System Information (please complete the following information):

  • OS & Version: Windows 11 [Version 10.0.22621.675]
  • ML.NET Version: ML.NT v2.0.0-preview.22551.1
  • .NET Version: .NET 6.0.9

Describe the bug
LightGBM trials fail with "bad allocation" error message and they cause OutOfMemoryException even if SetMaximumMemoryUsageInMegaByte() is used.

To Reproduce
Steps to reproduce the behavior:

  1. Create a pipeline that includes LightGBM, for example:
var pipeline =
  mlContext.Auto().Featurizer(trainTestData.TrainSet, numericColumns: new[] { "Features" })
      .Append(mlContext.Auto().Regression(useFastTree: false, useLbfgs: true, useSdca: true, useFastForest: false, useLgbm: true));

  1. Create an ML.NET experiment and set its memory limit. I set it to 20 GB, because I usually have ~24 GB available when I start the training.
var experiment = mlContext.Auto().CreateExperiment();

experiment
    .SetPipeline(pipeline)
    .SetTrainingTimeInSeconds(trainingTimeInSeconds)
    .SetRegressionMetric(RegressionMetric.RSquared, labelColumn: "Label")
    .SetDataset(trainTestData.TrainSet, trainTestData.TestSet)
    .SetMonitor(monitor)
    .SetMaximumMemoryUsageInMegaByte(20 * 1024);
  1. Run the experiment for an hour
  2. The experiment will end with OutOfMemoryException

Expected behavior
I would expect the AutoML stops the trial if it exceeds the memory limit and/or there is an exception during its run. It should not about the whole experiment.

Screenshots, Code, Sample Projects
Here are the logs of three runs I had today:

15:23:11.3 info: Bio Balance Detector Body Monitor v0.9.1.3
15:23:11.3 info: (The current UTC time is 2022-11-02 14:23:11)
15:23:11.6 info: Now listening on: https://0.0.0.0:7061
15:23:11.6 info: Now listening on: http://localhost:5061
15:23:11.6 info: Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
15:23:11.6 info: Microsoft.Hosting.Lifetime[0] Hosting environment: Development
15:23:11.6 info: Microsoft.Hosting.Lifetime[0] Content root path: C:\Work\BioBalanceDetector\Software\Source\BBDProto08\BBD.BodyMonitor\BBD.BodyMonitor.API\
15:23:38.1 info: Starting model training for 3600 seconds, using 'BBD_20221102_2022-11-02_MLP14_0p25Hz-6250Hz_IsSession.SegmentedData.Sleep.Level_1098rows.csv' as data source with the 'MLP14' profile.
15:23:51.5 info: Completed Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression             - Metric:    0,15 - Duration:     13 seconds
15:23:51.5 info:  New Best Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression             - Metric:    0,15
15:24:01.2 info: Completed Trial #   1 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression               - Metric:    0,05 - Duration:     10 seconds
15:24:11.1 info: Completed Trial #   2 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression               - Metric:   -2,20 - Duration:     10 seconds
15:50:15.5 info: Completed Trial #   3 - Pipeline: ReplaceMissingValues=>Concatenate=>SdcaRegression                   - Metric:    0,03 - Duration:     1564 seconds
15:50:32.2 info: Completed Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,41 - Duration:     17 seconds
15:50:32.2 info:  New Best Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,41
15:50:44.3 info: Completed Trial #   5 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression             - Metric:    0,16 - Duration:     12 seconds
15:50:50.0 info: Completed Trial #   6 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,40 - Duration:     6 seconds
15:51:15.3 info: Completed Trial #   7 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,43 - Duration:     25 seconds
15:51:15.3 info:  New Best Trial #   7 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,43
15:51:43.3 info: Completed Trial #   8 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,43 - Duration:     28 seconds
15:51:43.3 info:  New Best Trial #   8 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,43
15:51:54.4 info: Completed Trial #   9 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression - Metric:    0,14 - Duration:     11 seconds
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
17:49:52.6 info: BBD.BodyMonitor.API[0] Bio Balance Detector Body Monitor v0.9.1.3
17:49:52.6 info: BBD.BodyMonitor.API[0] (The current UTC time is 2022-11-02 16:49:52)
17:49:52.9 info: Microsoft.Hosting.Lifetime[14] Now listening on: https://0.0.0.0:7061
17:49:52.9 info: Microsoft.Hosting.Lifetime[14] Now listening on: http://localhost:5061
17:49:52.9 info: Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
17:49:52.9 info: Microsoft.Hosting.Lifetime[0] Hosting environment: Development
17:49:52.9 info: Microsoft.Hosting.Lifetime[0] Content root path: C:\Work\BioBalanceDetector\Software\Source\BBDProto08\BBD.BodyMonitor\BBD.BodyMonitor.API\
17:50:07.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Starting model training for 3600 seconds, using 'BBD_20221102_2022-11-02_MLP14_0p25Hz-6250Hz_IsSession.SegmentedData.Sleep.Level_1098rows.csv' as data source with the 'MLP14' profile.
17:50:08.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression
17:50:20.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,15 - Duration:     13 seconds
17:50:20.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]  New Best Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,15
17:50:20.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   1 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
17:50:30.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   1 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression                  - Metric:   -2,20 - Duration:     10 seconds
17:50:30.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   2 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression
17:50:40.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   2 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression                  - Metric:    0,05 - Duration:     10 seconds
17:50:40.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   3 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression
18:16:49.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   3 - Pipeline: ReplaceMissingValues=>Concatenate=>SdcaRegression                      - Metric:    0,03 - Duration:     1569 seconds
18:16:49.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>SdcaRegression
18:17:05.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41 - Duration:     17 seconds
18:17:05.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]  New Best Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41
18:17:05.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   5 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:17:17.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   5 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,21 - Duration:     12 seconds
18:17:17.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   6 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:17:22.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   6 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41 - Duration:     5 seconds
18:17:22.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   7 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:17:34.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   7 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,23 - Duration:     12 seconds
18:17:34.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   8 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:17:50.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   8 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,36 - Duration:     16 seconds
18:17:50.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   9 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:19:58.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   9 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,28 - Duration:     128 seconds
18:19:58.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  10 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:21:04.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  10 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,26 - Duration:     67 seconds
18:21:04.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  11 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:21:18.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  11 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,22 - Duration:     14 seconds
18:21:18.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  12 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:21:23.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  12 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,36 - Duration:     5 seconds
18:21:23.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  13 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:21:35.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  13 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,40 - Duration:     12 seconds
18:21:35.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  14 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:21:58.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  14 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,43 - Duration:     23 seconds
18:21:58.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]  New Best Trial #  14 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,43
18:21:58.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  15 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:22:14.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  15 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,27 - Duration:     16 seconds
18:22:14.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  16 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:25:06.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  16 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,28 - Duration:     172 seconds
18:25:06.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  17 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:25:11.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  17 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,39 - Duration:     5 seconds
18:25:11.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  18 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:25:16.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  18 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,38 - Duration:     5 seconds
18:25:16.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  19 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:25:22.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  19 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,03 - Duration:     6 seconds
18:25:22.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  20 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:25:32.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  20 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression                  - Metric:    0,02 - Duration:     10 seconds
18:25:32.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  21 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression
18:25:37.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  21 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,37 - Duration:     5 seconds
18:25:37.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  22 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
18:26:03.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  22 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,29 - Duration:     26 seconds
18:26:03.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  23 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:26:07.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  23 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,04 - Duration:     4 seconds
18:26:07.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  24 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
18:53:28.1 info: BBD.BodyMonitor.API[0] Bio Balance Detector Body Monitor v0.9.1.3
18:53:28.1 info: BBD.BodyMonitor.API[0] (The current UTC time is 2022-11-02 17:53:28)
18:53:28.6 info: Microsoft.Hosting.Lifetime[14] Now listening on: https://0.0.0.0:7061
18:53:28.6 info: Microsoft.Hosting.Lifetime[14] Now listening on: http://localhost:5061
18:53:28.6 info: Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
18:53:28.6 info: Microsoft.Hosting.Lifetime[0] Hosting environment: Development
18:53:28.6 info: Microsoft.Hosting.Lifetime[0] Content root path: C:\Work\BioBalanceDetector\Software\Source\BBDProto08\BBD.BodyMonitor\BBD.BodyMonitor.API\
18:53:44.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Starting model training for 3600 seconds, using 'BBD_20221102_2022-11-02_MLP14_0p25Hz-6250Hz_IsSession.SegmentedData.Sleep.Level_1098rows.csv' as data source with the 'MLP14' profile.
18:53:44.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression
18:53:58.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,15 - Duration:     14 seconds
18:53:58.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]  New Best Trial #   0 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,15
18:53:58.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   1 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
18:54:08.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   1 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression                  - Metric:    0,05 - Duration:     10 seconds
18:54:08.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   2 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression
19:19:50.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   2 - Pipeline: ReplaceMissingValues=>Concatenate=>SdcaRegression                      - Metric:    0,03 - Duration:     1542 seconds
19:19:50.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   3 - Pipeline: ReplaceMissingValues=>Concatenate=>SdcaRegression
19:20:00.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   3 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression                  - Metric:   -2,20 - Duration:     10 seconds
19:20:00.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>FastTreeRegression
19:20:17.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41 - Duration:     17 seconds
19:20:17.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]  New Best Trial #   4 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41
19:20:17.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   5 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:20:32.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   5 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,25 - Duration:     15 seconds
19:20:32.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   6 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
19:21:19.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   6 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,26 - Duration:     47 seconds
19:21:19.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   7 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
19:21:41.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   7 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,39 - Duration:     23 seconds
19:21:41.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   8 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:21:46.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   8 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,37 - Duration:     5 seconds
19:21:46.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #   9 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:22:04.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #   9 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,23 - Duration:     18 seconds
19:22:04.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  10 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
19:22:08.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  10 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41 - Duration:     5 seconds
19:22:08.9 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  11 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:22:26.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  11 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,31 - Duration:     18 seconds
19:22:26.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  12 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:22:31.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  12 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,36 - Duration:     5 seconds
19:22:31.2 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  13 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:22:58.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  13 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,35 - Duration:     27 seconds
19:22:58.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  14 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:23:04.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  14 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,40 - Duration:     6 seconds
19:23:04.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  15 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:23:20.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  15 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,24 - Duration:     16 seconds
19:23:20.4 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  16 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:23:37.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  16 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41 - Duration:     17 seconds
19:23:37.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  17 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:23:46.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  17 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,39 - Duration:     9 seconds
19:23:46.7 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  18 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:24:14.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  18 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,43 - Duration:     28 seconds
19:24:14.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]  New Best Trial #  18 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,43
19:24:14.6 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  19 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:24:46.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  19 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,43 - Duration:     32 seconds
19:24:46.8 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  20 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:24:57.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  20 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,41 - Duration:     10 seconds
19:24:57.1 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  21 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:25:14.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  21 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression    - Metric:    0,29 - Duration:     17 seconds
19:25:14.3 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  22 - Pipeline: ReplaceMissingValues=>Concatenate=>LbfgsPoissonRegressionRegression
19:26:55.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  22 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression                - Metric:    0,27 - Duration:     101 seconds
19:26:55.5 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  23 - Pipeline: ReplaceMissingValues=>Concatenate=>FastForestRegression
19:27:05.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0] Completed Trial #  23 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression                  - Metric:    0,13 - Duration:     10 seconds
19:27:05.0 info: BBD.BodyMonitor.API.Controllers.MachineLearningController[0]   Running Trial #  24 - Pipeline: ReplaceMissingValues=>Concatenate=>LightGbmRegression
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation

Additional context
My project is open-source so both the source and the data are available. Let me know if you need them for testing.

@ghost ghost added the untriaged New issue has not been triaged label Nov 2, 2022
@michaelgsharp michaelgsharp added this to the ML.NET Future milestone Nov 28, 2022
@ghost ghost removed the untriaged New issue has not been triaged label Nov 28, 2022
@michaelgsharp michaelgsharp added AutoML.NET Automating various steps of the machine learning process untriaged New issue has not been triaged labels Nov 28, 2022
@ghost ghost removed the untriaged New issue has not been triaged label Nov 28, 2022
@michaelgsharp
Copy link
Member

@LittleLittleCloud familiar with this?

@LittleLittleCloud
Copy link
Contributor

The warning message should be from lightGBM unmanaged code while SetMaximumMemoryUsageInMegaByte only restricts memory usage in managed code. So if OutOfMemory happens in unmanaged code AutoML.Net won't have the ability to cancel that trial.

@andrasfuchs can you share with us the size of your machine.

@andrasfuchs
Copy link
Contributor Author

andrasfuchs commented Dec 2, 2022

@LittleLittleCloud The critical resource in this regard is the size of the virtual memory in my system. I have 32 GB RAM and ~43 GB of virtual memory. I start to get this bad allocation warning if there is no more free virtual memory left.

I don't know how virtual memory works in other OSs, but I used this Windows-specific method in my IMonitor to monitor it during my trials:

if (OperatingSystem.IsWindows())
{
   ObjectQuery wql = new ObjectQuery("SELECT * FROM Win32_OperatingSystem");
   ManagementObjectSearcher searcher = new ManagementObjectSearcher(wql);
   ManagementObjectCollection results = searcher.Get();

   foreach (ManagementObject result in results)
   {
       ulong totalVisibleMemorySize = (ulong)result["TotalVisibleMemorySize"];
       ulong freePhysicalMemory = (ulong)result["FreePhysicalMemory"];
       ulong totalVirtualMemorySize = (ulong)result["TotalVirtualMemorySize"];
       ulong freeVirtualMemory = (ulong)result["FreeVirtualMemory"];

       if (freeVirtualMemory < 2 * 1024 * 1024)
       {
           _logger.LogWarning($"{"Resources".PadLeft(10)} Trial #{trialSettings.TrialId.ToString().PadLeft(4)} - Cancelling trial due to low virtual memory");
           trialSettings.CancellationTokenSource.Cancel();
       }
   }
}

(Note: I use this check in my modified ML.NET code, where the IMonitor has a ReportTrialResourceUsage(TrialSettings settings) method that gets called repeatedly during trial runs to allow continuous monitoring.)

@luisquintanilla
Copy link
Contributor

Closing this issue since it should've been addressed in #6520.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AutoML.NET Automating various steps of the machine learning process
Projects
None yet
Development

No branches or pull requests

4 participants