Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functional tests for ONNX scenarios #2984

Merged
merged 7 commits into from
Mar 19, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions test/Microsoft.ML.Functional.Tests/Common.cs
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,14 @@ public static void AssertTestTypeDatasetsAreEqual(MLContext mlContext, IDataView
/// </summary>
/// <param name="array1">An array of floats.</param>
/// <param name="array2">An array of floats.</param>
public static void AssertEqual(float[] array1, float[] array2)
public static void AssertEqual(float[] array1, float[] array2, int precision = 6)
{
Assert.NotNull(array1);
Assert.NotNull(array2);
Assert.Equal(array1.Length, array2.Length);

for (int i = 0; i < array1.Length; i++)
Assert.Equal(array1[i], array2[i]);
Assert.Equal(array1[i], array2[i], precision: precision);
}

/// <summary>
Expand Down
195 changes: 195 additions & 0 deletions test/Microsoft.ML.Functional.Tests/ONNX.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.IO;
using Microsoft.ML.Functional.Tests.Datasets;
using Microsoft.ML.RunTests;
using Microsoft.ML.TestFramework;
using Microsoft.ML.TestFramework.Attributes;
using Microsoft.ML.Trainers;
using Microsoft.ML.Trainers.FastTree;
using Microsoft.ML.Transforms;
using Xunit;
using Xunit.Abstractions;

namespace Microsoft.ML.Functional.Tests
{
public class ONNX : BaseTestClass
{
public ONNX(ITestOutputHelper output) : base(output)
{
}

/// <summary>
/// ONNX: I can save a model to ONNX and reload it and use it in a pipeline.
/// </summary>
[OnnxFactAttribute]
public void SaveOnnxModelLoadAndScoreFastTree()
{
var mlContext = new MLContext(seed: 1);

// Get the dataset.
var data = mlContext.Data.LoadFromTextFile<HousingRegression>(GetDataPath(TestDatasets.housing.trainFilename), hasHeader: true);

// Create a pipeline to train on the housing data.
var pipeline = mlContext.Transforms.Concatenate("Features", HousingRegression.Features)
.Append(mlContext.Transforms.Normalize("Features"))
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.Regression.Trainers.FastTree(
new FastTreeRegressionTrainer.Options { NumberOfThreads = 1, NumberOfTrees = 10 }));
Copy link

@shmoradims shmoradims Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 1 [](start = 76, length = 3)

why threads equal to 1? is there a know issue with multi-threading? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no issues with multithreading. This is for convenience. Namely, we run these sorts of tests single-threaded because

  • Many are running at the same time;
  • Some algorithms are non-deterministic when multitheaded (e.g. SDCA) and make testing harder (boxing vs exact).

In reply to: 267028362 [](ancestors = 267028362)


// Fit the pipeline.
var model = pipeline.Fit(data);

// Serialize the pipeline to a file.
var modelFileName = "SaveOnnxLoadAndScoreFastTreeModel.onnx";
var modelPath = DeleteOutputPath(modelFileName);
using (var file = File.Create(modelPath))
mlContext.Model.ConvertToOnnx(model, data, file);

// Load the model as a transform.
var onnxEstimator = mlContext.Transforms.ApplyOnnxModel(modelPath);
var onnxModel = onnxEstimator.Fit(data);

// TODO #2980: ONNX outputs don't match the outputs of the model, so we must hand-correct this for now.
// TODO #2981: ONNX models cannot be fit as part of a pipeline, so we must use a workaround like this.
var onnxWorkaroundPipeline = onnxModel.Append(
mlContext.Transforms.CopyColumns("Score", "Score0").Fit(onnxModel.Transform(data)));
Copy link
Contributor

@zeahmed zeahmed Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Score0 [](start = 59, length = 6)

Just a question: where does Score0 come from? is it produced by onnx transform? #Resolved

Copy link
Contributor Author

@rogancarr rogancarr Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where the '0' suffices are entering the pipeline yet. I am digging into it as part of this bug: #2980. I'll triage the three issues #2980, #2981, #2982 exposed by these tests separately from this PR.


In reply to: 267067407 [](ancestors = 267067407)


// Create prediction engine and test predictions.
var originalPredictionEngine = model.CreatePredictionEngine<HousingRegression, ScoreColumn>(mlContext);
// TODO #2982: ONNX produces vector types and not the original output type.
var onnxPredictionEngine = onnxWorkaroundPipeline.CreatePredictionEngine<HousingRegression, OnnxScoreColumn>(mlContext);

// Take a handful of examples out of the dataset and compute predictions.
var dataEnumerator = mlContext.Data.CreateEnumerable<HousingRegression>(mlContext.Data.TakeRows(data, 5), false);
foreach (var row in dataEnumerator)
{
var originalPrediction = originalPredictionEngine.Predict(row);
var onnxPrediction = onnxPredictionEngine.Predict(row);
// Check that the predictions are identical.
Assert.Equal(originalPrediction.Score, onnxPrediction.Score[0], precision: 4); // Note the low-precision equality!
}
}

/// <summary>
/// ONNX: I can save a model to ONNX and reload it and use it in a pipeline.
/// </summary>
[OnnxFactAttribute]
public void SaveOnnxModelLoadAndScoreKMeans()
{
var mlContext = new MLContext(seed: 1);

// Get the dataset.
var data = mlContext.Data.LoadFromTextFile<HousingRegression>(GetDataPath(TestDatasets.housing.trainFilename), hasHeader: true);

// Create a pipeline to train on the housing data.
var pipeline = mlContext.Transforms.Concatenate("Features", HousingRegression.Features)
.Append(mlContext.Transforms.Normalize("Features"))
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.Clustering.Trainers.KMeans(
new KMeansTrainer.Options { NumberOfThreads = 1, MaximumNumberOfIterations = 10 }));

// Fit the pipeline.
var model = pipeline.Fit(data);

// Serialize the pipeline to a file.
var modelFileName = "SaveOnnxLoadAndScoreKMeansModel.onnx";
var modelPath = DeleteOutputPath(modelFileName);
using (var file = File.Create(modelPath))
mlContext.Model.ConvertToOnnx(model, data, file);

// Load the model as a transform.
var onnxEstimator = mlContext.Transforms.ApplyOnnxModel(modelPath);
var onnxModel = onnxEstimator.Fit(data);

// TODO #2980: ONNX outputs don't match the outputs of the model, so we must hand-correct this for now.
// TODO #2981: ONNX models cannot be fit as part of a pipeline, so we must use a workaround like this.
var onnxWorkaroundPipeline = onnxModel.Append(
mlContext.Transforms.CopyColumns("Score", "Score0").Fit(onnxModel.Transform(data)));

// Create prediction engine and test predictions.
var originalPredictionEngine = model.CreatePredictionEngine<HousingRegression, ClusteringScoreColumn>(mlContext);
// TODO #2982: ONNX produces vector types and not the original output type.
var onnxPredictionEngine = onnxWorkaroundPipeline.CreatePredictionEngine<HousingRegression, ClusteringScoreColumn>(mlContext);

// Take a handful of examples out of the dataset and compute predictions.
var dataEnumerator = mlContext.Data.CreateEnumerable<HousingRegression>(mlContext.Data.TakeRows(data, 5), false);
foreach (var row in dataEnumerator)
{
var originalPrediction = originalPredictionEngine.Predict(row);
var onnxPrediction = onnxPredictionEngine.Predict(row);
// Check that the predictions are identical.
Common.AssertEqual(originalPrediction.Score, onnxPrediction.Score, precision: 4); // Note the low precision!
}
}

/// <summary>
/// ONNX: I can save a model to ONNX and reload it and use it in a pipeline.
/// </summary>
[OnnxFactAttribute]
public void SaveOnnxModelLoadAndScoreSDCA()
{
var mlContext = new MLContext(seed: 1);

// Get the dataset.
var data = mlContext.Data.LoadFromTextFile<HousingRegression>(GetDataPath(TestDatasets.housing.trainFilename), hasHeader: true);

// Create a pipeline to train on the housing data.
var pipeline = mlContext.Transforms.Concatenate("Features", HousingRegression.Features)
.Append(mlContext.Transforms.Normalize("Features"))
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.Regression.Trainers.Sdca(
new SdcaRegressionTrainer.Options { NumberOfThreads = 1, MaximumNumberOfIterations = 10 }));

// Fit the pipeline.
var model = pipeline.Fit(data);

// Serialize the pipeline to a file.
var modelFileName = "SaveOnnxLoadAndScoreSdcaModel.onnx";
var modelPath = DeleteOutputPath(modelFileName);
using (var file = File.Create(modelPath))
mlContext.Model.ConvertToOnnx(model, data, file);

// Load the model as a transform.
var onnxEstimator = mlContext.Transforms.ApplyOnnxModel(modelPath);
var onnxModel = onnxEstimator.Fit(data);

// TODO #2980: ONNX outputs don't match the outputs of the model, so we must hand-correct this for now.
// TODO #2981: ONNX models cannot be fit as part of a pipeline, so we must use a workaround like this.
var onnxWorkaroundPipeline = onnxModel.Append(
mlContext.Transforms.CopyColumns("Score", "Score0").Fit(onnxModel.Transform(data)));

// Create prediction engine and test predictions.
var originalPredictionEngine = model.CreatePredictionEngine<HousingRegression, ScoreColumn>(mlContext);
// TODO #2982: ONNX produces vector types and not the original output type.
var onnxPredictionEngine = onnxWorkaroundPipeline.CreatePredictionEngine<HousingRegression, OnnxScoreColumn>(mlContext);

// Take a handful of examples out of the dataset and compute predictions.
var dataEnumerator = mlContext.Data.CreateEnumerable<HousingRegression>(mlContext.Data.TakeRows(data, 5), false);
foreach (var row in dataEnumerator)
{
var originalPrediction = originalPredictionEngine.Predict(row);
var onnxPrediction = onnxPredictionEngine.Predict(row);
// Check that the predictions are identical.
Assert.Equal(originalPrediction.Score, onnxPrediction.Score[0], precision: 4); // Note the low-precision equality!
}
}

private class ScoreColumn
{
public float Score { get; set; }
}

private class OnnxScoreColumn
Copy link
Member

@wschin wschin Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between OnnxScoreColumn and ClusteringScoreColumn? Without specifying the dimension of Score field, I am not sure the code is safe. Maybe we can do

[VectorType(dimension)]
public float[] Score { get; set; }
``` #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ScoreColumn class will be replaced with a general one being added by the ModelFiles PR.

The OnnxScoreColumn is there to make it explicit that we are working around a ?bug? in our ONNX implementation.

The ClusteringScoreColumn I added specifically for clustering.

Having wrote that out, we can delete the last two and make a ScoreArrayColumn class in the general helper class files (in ModelFiles, but I'll pull it into this PR instead and rebase that one).

On the topic of VectorType, we don't need to specify the dimension. Specifying a dimension just guarantees that the vector will be the same length for each row. That has the downside of making the classes non-reusable, so for helper classes in tests, we usually don't specify this attribute.


In reply to: 266636611 [](ancestors = 266636611)

{
public float[] Score { get; set; }
}

private class ClusteringScoreColumn
{
public float[] Score { get; set; }
}
}
}