-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System Information (please complete the following information):
- OS & Version: Windows 10
- ML.NET Version: ML.NET 2.0.0
- .NET Version: .NET4.8 and .NET 6 same error
VisualStudio 2022 + ML.NET ModelBuilder 2022 (17.14.4.2312404)
Describe the bug
I have seen few issues regarding similliar error but all of them regards ImageClassification. Mine regards DataClassification
DataClassification using SQL Server View
To Reproduce
- DataClassification
- Lot of columns (date, decimals, ints) 2 varchar and 1 label is fine
- Lot of columns (date, decimals, ints) 2 varchar and 1 label and 1 more problematic varchar columns (StringCol3) instant - error
- If i replace this problematic varchar column with constants for all rows something like 'abc' as [StringCol3] training is fine
Can't attach those columns due to sensitive data.
But I anoymyzed and trimmed enought data to reproduce (attached SQL scripts to create table and insert records)
This should be enough to reproduce this error.
aibug.zip
start multiclass classification
Evaluate Metric: MacroAccuracy
Available Trainers: SDCA,LBFGS,LGBM,FASTTREE,FASTFOREST
Training time in second: 300
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=OVA; Fitting, Kind=Info] Training learner 0
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=Converter; InitDataset, Kind=Info] Making per-feature arrays
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=Converter; InitBoundariesAndLabels, Kind=Info] Changing data from row-wise to column-wise
Splitter/consolidator worker encountered exception while consuming source data
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext()
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData)
at Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Trainers.OneVersusAllTrainer.TrainOne(IChannel ch, ITrainerEstimator`2 trainer, RoleMappedData data, Int32 cls)
at Microsoft.ML.Trainers.OneVersusAllTrainer.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Microsoft.ML.AutoML.SweepablePipelineRunner.Run(TrialSettings settings)
at Microsoft.ML.AutoML.SweepablePipelineRunner.RunAsync(TrialSettings settings, CancellationToken ct)
at Microsoft.ML.AutoML.AutoMLExperiment.<RunAsync>d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.MultiClassificationExperiment.<ExecuteAsync>d__14.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/MultiClassificationExperiment.cs:line 123
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.<StartTrainingAsync>d__21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 145
BONUS if you make a View, replace 1 column with empty/null and use this View for DataClassification
CREATE OR ALTER VIEW XYZ
AS
SELECT
.....
, RIGHT(StringCol3,0) AS StringCol3
.....
FROM AIBug
You will get another error
Schema mismatch for input column 'StringCol3_CharExtractor': expected Expected known-size vector of Single, got Vector<Single>
Parameter name: inputSchema
Expected behavior
No error? Or human readable information what is wrong and how to fix it.
Screenshots, Code, Sample Projects
If applicable, add screenshots, code snippets, or sample projects to help explain your problem.