Skip to content

Splitter/consolidator worker encountered exception while consuming source data (DataClassification AutoML) #6621

@rzechu

Description

@rzechu

System Information (please complete the following information):

  • OS & Version: Windows 10
  • ML.NET Version: ML.NET 2.0.0
  • .NET Version: .NET4.8 and .NET 6 same error
    VisualStudio 2022 + ML.NET ModelBuilder 2022 (17.14.4.2312404)

Describe the bug
I have seen few issues regarding similliar error but all of them regards ImageClassification. Mine regards DataClassification

DataClassification using SQL Server View

To Reproduce

  1. DataClassification
  2. Lot of columns (date, decimals, ints) 2 varchar and 1 label is fine
  3. Lot of columns (date, decimals, ints) 2 varchar and 1 label and 1 more problematic varchar columns (StringCol3) instant - error
  4. If i replace this problematic varchar column with constants for all rows something like 'abc' as [StringCol3] training is fine

Can't attach those columns due to sensitive data.
But I anoymyzed and trimmed enought data to reproduce (attached SQL scripts to create table and insert records)
This should be enough to reproduce this error.
aibug.zip

start multiclass classification
Evaluate Metric: MacroAccuracy
Available Trainers: SDCA,LBFGS,LGBM,FASTTREE,FASTFOREST
Training time in second: 300
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=OVA; Fitting, Kind=Info] Training learner 0
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=Converter; InitDataset, Kind=Info] Making per-feature arrays
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=Converter; InitBoundariesAndLabels, Kind=Info] Changing data from row-wise to column-wise

Splitter/consolidator worker encountered exception while consuming source data

   at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
   at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
   at Microsoft.ML.Data.RootCursorBase.MoveNext()
   at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext()
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
   at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData)
   at Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Trainers.OneVersusAllTrainer.TrainOne(IChannel ch, ITrainerEstimator`2 trainer, RoleMappedData data, Int32 cls)
   at Microsoft.ML.Trainers.OneVersusAllTrainer.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.AutoML.SweepablePipelineRunner.Run(TrialSettings settings)
   at Microsoft.ML.AutoML.SweepablePipelineRunner.RunAsync(TrialSettings settings, CancellationToken ct)
   at Microsoft.ML.AutoML.AutoMLExperiment.<RunAsync>d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.MultiClassificationExperiment.<ExecuteAsync>d__14.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/MultiClassificationExperiment.cs:line 123
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLEngine.<StartTrainingAsync>d__21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 145

BONUS if you make a View, replace 1 column with empty/null and use this View for DataClassification

CREATE OR ALTER VIEW XYZ 
AS
SELECT
 .....
, RIGHT(StringCol3,0) AS StringCol3
.....
FROM AIBug

You will get another error

Schema mismatch for input column 'StringCol3_CharExtractor': expected Expected known-size vector of Single, got Vector<Single>
Parameter name: inputSchema

Expected behavior
No error? Or human readable information what is wrong and how to fix it.

Screenshots, Code, Sample Projects
If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions