Skip to content

[Issue, ML.net CLI] 330GB csv file of data cause a OutOfMemoryException (1/2) #6288

@wil70

Description

@wil70

System Information (please complete the following information):

  • OS & Version: Win8, latest version as of this bug entry
  • ML.NET Version: 16.13.9
  • .NET Version:6.0.303

Describe the bug
When I start ML.net from CLI, I get a OutOfMemoryException
I have 64GB Ram, I have a 330GB csv file of data.

I tried with
To Reproduce
Steps to reproduce the behavior:

  1. Generate a 330GB file with 4209 columns with random data
  2. open prompt
  3. type in command line:
    mlnet classification --train-time 75600 --name SampleClassification --log-file-path c:\Log_data.txt --has-header true --label-col 4209 --ignore-cols 0,1,4206,4207,4208 --dataset "c:\data.csv" --test-dataset "c:\test_data.csv"
  4. See error log at the end of this message with the OutOfMemoryException

Expected behavior
I expect ml.net to continue and feed the data as it stream it, so there should be no OutOfMemoryException
When I monitor the mknet.exe prices with task manager, the mlnet.exe process doesn't go high at all, like less than ~14GB. So something is not right as I have 64GB and also it shouldn't matter isn't it as .

Screenshots, Code, Sample Projects
Additional context
Here is the log
Start Training
start nni training
Experiment output folder: C:\Users\W\AppData\Local\Temp\AutoML-NNI\Experiment-GET3JS
System.FormatException: Parsing failed with an exception: Stream reading encountered exception
---> System.FormatException: Stream reading encountered exception
---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Text.StringBuilder.ToString()
at System.IO.StreamReader.ReadLine()
at Microsoft.ML.Data.TextLoader.Cursor.LineReader.ThreadProc()
--- End of inner exception stack trace ---
at Microsoft.ML.Data.TextLoader.Cursor.LineReader.GetBatch()
at Microsoft.ML.Data.TextLoader.Cursor.ParallelState.Parse(Int32 tid)
at Microsoft.ML.Data.TextLoader.Cursor.ParallelState.ThreadProc(Object obj)
--- End of inner exception stack trace ---
at Microsoft.ML.Data.TextLoader.Cursor.ParseParallel(ParallelState state)+MoveNext()
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.ModelBuilder.AutoMLService.Proposer.Controller.CountRows(IDataView data, Int64 maxRows) in //src/Microsoft.ML.ModelBuilder.AutoMLService/Proposer/Controller.cs:line 174
at Microsoft.ML.ModelBuilder.AutoMLService.Proposer.Controller.Initialize() in /
/src/Microsoft.ML.ModelBuilder.AutoMLService/Proposer/Controller.cs:line 111
at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.LocalAutoMLExperiment.ExecuteAsync(IDataView trainData, IDataView validateData, ColumnInformation columnInformation, CancellationToken cancellationToken, CancellationToken timeout) in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/LocalAutoMLExperiment.cs:line 138
at Microsoft.ML.ModelBuilder.AutoMLEngine.StartTrainingAsync(TrainingConfiguration config, PathConfiguration pathConfig, CancellationToken userCancellationToken) in /
/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 160
at Microsoft.ML.CLI.Runners.AutoMLRunner.ExecuteAsync() in //src/mlnet/Runners/AutoMLRunner.cs:line 88
at Microsoft.ML.CLI.Program.TrainAsync(TrainingConfiguration trainingConfiguration, PathConfiguration pathConfig, AutoMLServiceLogLevel logLevel) in /
/src/mlnet/Program.cs:line 348
at Microsoft.ML.CLI.Program.AutoMLCommandRunner(AutoMLCommand command, Boolean skipGenerateConsoleApp) in //src/mlnet/Program.cs:line 329
at Microsoft.ML.CLI.Program.<>c.<b__4_0>d.MoveNext() in /
/src/mlnet/Program.cs:line 89
--- End of stack trace from previous location ---
at System.CommandLine.Invocation.CommandHandler.GetExitCodeAsync(Object value, InvocationContext context)
at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.ML.CLI.Program.<>c__DisplayClass4_0.<b__9>d.MoveNext() in /_/src/mlnet/Program.cs:line 290
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<b__24_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass11_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<b__10_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<b__0>d.MoveNext()
Check out log file for more information: c:\Log_data.txt
Exiting ...

C:\Users\W>'


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions