Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functional tests for ONNX scenarios #2984

Merged
merged 7 commits into from
Mar 19, 2019

Conversation

rogancarr
Copy link
Contributor

@rogancarr rogancarr commented Mar 15, 2019

As laid out in #2498 , we need scenarios to cover the ONNX functionality we want fully supported in V1.

Scenarios:

  • I can take an existing ONNX model and get predictions from it (as both final output and as input to downstream pipelines)
  • P1: I can export ML.NET models to ONNX (limited to the existing internal functionality) (In Model Files section, but can be fleshed out a bit better with other ONNX tests)

Fixes #2963

@rogancarr
Copy link
Contributor Author

rogancarr commented Mar 15, 2019

These tests roundtrip the main ONNX exportable classes in ML.NET, and show that it's possible to serialize, deserialize, and use them as scorers / transforms. Thus, there are no separate tests for the first scenario.

@codecov
Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2984 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2984      +/-   ##
==========================================
+ Coverage   72.38%    72.4%   +0.01%     
==========================================
  Files         803      803              
  Lines      143569   143655      +86     
  Branches    16162    16165       +3     
==========================================
+ Hits       103924   104012      +88     
+ Misses      35227    35224       -3     
- Partials     4418     4419       +1
Flag Coverage Δ
#Debug 72.4% <100%> (+0.01%) ⬆️
#production 68.09% <ø> (ø) ⬆️
#test 88.59% <100%> (+0.03%) ⬆️
Impacted Files Coverage Δ
test/Microsoft.ML.Functional.Tests/Common.cs 98.23% <100%> (ø) ⬆️
...soft.ML.Functional.Tests/Datasets/CommonColumns.cs 100% <100%> (ø)
test/Microsoft.ML.Functional.Tests/ONNX.cs 100% <100%> (ø)
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.89% <0%> (+0.62%) ⬆️
...Microsoft.ML.OnnxConverter/OnnxExportExtensions.cs 100% <0%> (+9.09%) ⬆️

public float Score { get; set; }
}

private class OnnxScoreColumn
Copy link
Member

@wschin wschin Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between OnnxScoreColumn and ClusteringScoreColumn? Without specifying the dimension of Score field, I am not sure the code is safe. Maybe we can do

[VectorType(dimension)]
public float[] Score { get; set; }
``` #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ScoreColumn class will be replaced with a general one being added by the ModelFiles PR.

The OnnxScoreColumn is there to make it explicit that we are working around a ?bug? in our ONNX implementation.

The ClusteringScoreColumn I added specifically for clustering.

Having wrote that out, we can delete the last two and make a ScoreArrayColumn class in the general helper class files (in ModelFiles, but I'll pull it into this PR instead and rebase that one).

On the topic of VectorType, we don't need to specify the dimension. Specifying a dimension just guarantees that the vector will be the same length for each row. That has the downside of making the classes non-reusable, so for helper classes in tests, we usually don't specify this attribute.


In reply to: 266636611 [](ancestors = 266636611)

.Append(mlContext.Transforms.Normalize("Features"))
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.Regression.Trainers.FastTree(
new FastTreeRegressionTrainer.Options { NumberOfThreads = 1, NumberOfTrees = 10 }));
Copy link

@shmoradims shmoradims Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 1 [](start = 76, length = 3)

why threads equal to 1? is there a know issue with multi-threading? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no issues with multithreading. This is for convenience. Namely, we run these sorts of tests single-threaded because

  • Many are running at the same time;
  • Some algorithms are non-deterministic when multitheaded (e.g. SDCA) and make testing harder (boxing vs exact).

In reply to: 267028362 [](ancestors = 267028362)

// TODO #2980: ONNX outputs don't match the outputs of the model, so we must hand-correct this for now.
// TODO #2981: ONNX models cannot be fit as part of a pipeline, so we must use a workaround like this.
var onnxWorkaroundPipeline = onnxModel.Append(
mlContext.Transforms.CopyColumns("Score", "Score0").Fit(onnxModel.Transform(data)));
Copy link
Contributor

@zeahmed zeahmed Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Score0 [](start = 59, length = 6)

Just a question: where does Score0 come from? is it produced by onnx transform? #Resolved

Copy link
Contributor Author

@rogancarr rogancarr Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where the '0' suffices are entering the pipeline yet. I am digging into it as part of this bug: #2980. I'll triage the three issues #2980, #2981, #2982 exposed by these tests separately from this PR.


In reply to: 267067407 [](ancestors = 267067407)

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit c38f81b into dotnet:master Mar 19, 2019
@rogancarr rogancarr deleted the 2963_onnx_scenario_tests branch March 19, 2019 22:44
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants