Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OnnxTransform for scoring Onnx 1.2 models - integrates Microsoft.ML.Scoring/Sonoma Library #942

Merged
merged 32 commits into from
Sep 26, 2018

Conversation

jignparm
Copy link
Contributor

@jignparm jignparm commented Sep 19, 2018

Fixes issue #695
Fixes issue #892

This adds a new transform for scoring Onnx v1.2 models, leveraging an updated version of the scoring library at the link below.

https://www.nuget.org/packages/Microsoft.ML.Scoring/

@dnfclas
Copy link

dnfclas commented Sep 19, 2018

CLA assistant check
All CLA requirements met. #Closed

@jignparm jignparm changed the title initial checkin PR - OnnxTransform Sep 19, 2018
@jignparm jignparm changed the title PR - OnnxTransform [WIP] PR - OnnxTransform Sep 19, 2018
@jignparm jignparm changed the title [WIP] PR - OnnxTransform [WIP] Add OnnxTransform for scoring Onnx 1.2 models - integrates Microsoft.ML.Scoring/Sonoma Library Sep 19, 2018

<ItemGroup>
<ProjectReference Include="../Microsoft.ML/Microsoft.ML.nupkgproj" />
<PackageReference Include="Microsoft.ML.Scoring" Version="1.0.4-dev47509" />
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.0.4-dev47509 [](start = 62, length = 14)

please put it to https://github.com/dotnet/machinelearning/blob/master/build/Dependencies.props #Closed

internal const string LoaderSignature = "OnnxTransform";

public readonly string[] Inputs;
public readonly string[] Outputs;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to support multiple input /outputs or only one?
It looks weird, you constantly cast single objects to arrays and back. #Closed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's single input/output transform. I cleaned all the casts.


In reply to: 219612560 [](ancestors = 219612560)

{
public sealed class OnnxTransform : ITransformer, ICanSaveModel
{
public sealed class Arguments : TransformInputBase
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments [](start = 28, length = 9)

So far we have tendency to use Arguments class for entry points and command line, and to have separate class ColumnInfo with same functionality, but without all attributes.
And in our constructors we tend to use ColumnInfo class (except constructor for SignatureLoadDataTransform) #Pending

Copy link

@shmoradims shmoradims Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was basing this on TensorFlowTransform which only has Arguments class. I looked at couple other transforms (MutualInformationFeatureSelectionTransform, GroupTransform) and they have Arguments only. Could you please point me to a sample transform that follows above? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Adding Shahab's response]
I was basing this on TensorFlowTransform which only has Arguments class. I looked at couple other transforms (MutualInformationFeatureSelectionTransform, GroupTransform) and they have Arguments only. Could you please point me to a sample transform that follows above?


In reply to: 219613034 [](ancestors = 219613034)

}

// Factory method for SignatureLoadDataTransform
public static IDataTransform Create(IHostEnvironment env, ModelLoadContext ctx, IDataView input)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

can make private #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for all factory methods


In reply to: 219613072 [](ancestors = 219613072)

Copy link

@shmoradims shmoradims Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't some of the Create(...) need to be public, b/c they get called from another assembly? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some, yes, probably one which returns IDataTransform. All other methods, just for our dependency framework, and can be private.


In reply to: 220260318 [](ancestors = 220260318)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made them private


In reply to: 219613072 [](ancestors = 219613072)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks Ivan. We've made them private for now -- all unit tests are passing so far.


In reply to: 220309543 [](ancestors = 220309543,220260318)

ctx.SaveNonEmptyString(_args.OutputColumn);
}

internal sealed class Mapper : IRowMapper
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

why not private? #Closed

Copy link

@shmoradims shmoradims Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made private #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made private


In reply to: 219613222 [](ancestors = 219613222)

/// After adaptation, you'd call GetTensor() on the IdvToTensorAdapter object to get the Tensor equivalent of
/// each row.
/// </summary>
internal sealed class IdvToTensorAdapter
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdvToTensorAdapter [](start = 30, length = 18)

I would move it to OnnxUtils #Closed

Copy link

@shmoradims shmoradims Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to OnnxUtils #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to OnnxUtils.


In reply to: 219613978 [](ancestors = 219613978)

[Argument(ArgumentType.Required, HelpText = "Path to the onnx model file.", ShortName = "model", SortOrder = 0)]
public string ModelFile;

[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "TBD", SortOrder = 1)]
Copy link
Member

@abgoswam abgoswam Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD [](start = 81, length = 3)

nit: needs to be updated #Resolved

Copy link

@shmoradims shmoradims Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


In reply to: 219906640 [](ancestors = 219906640)


public static OnnxModel CreateFromBytes(byte[] modelBytes)
{
var tempModelDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempModelDir [](start = 16, length = 12)

when does this tempModelDir get cleaned up ?

public static OnnxModel CreateFromBytes(byte[] modelBytes)
{
var tempModelDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
Directory.CreateDirectory(tempModelDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempModelDir [](start = 38, length = 12)

might consider ACL this temp dir .

Temp directories with models are considered executable code and considered security threat by .NET security team

</para>

<para>
This transform requires the <a href="https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML.TensorFlow/0.5.0-preview-26830-5">Microsoft.ML.TensorFlow</a> nuget to be installed.
Copy link
Member

@abgoswam abgoswam Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft.ML.TensorFlow [](start = 150, length = 23)

is this a requirement for the OnnxTransform ? #Resolved

Copy link

@shmoradims shmoradims Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jignesh, this should be Sonoma nuget, right? Could you please update it. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire doc.xml has been modified. No reference to TF anywhere now.


In reply to: 219920734 [](ancestors = 219920734)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this doc has been rewritten. The version being commented on was obselete.


In reply to: 220279203 [](ancestors = 220279203)

{
_host.CheckNonWhiteSpace(args.ModelFile, nameof(args.ModelFile));
_host.CheckUserArg(File.Exists(args.ModelFile), nameof(args.ModelFile));
_model = new OnnxModel(args.ModelFile);
Copy link
Member

@abgoswam abgoswam Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_model [](start = 16, length = 6)

is there a memory limitation of keeping everything in memory ? i would presume some Onnx models to be quite large
#Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is what is supported in Sonoma currently. There's no alternative at this point. This can be a design discussion for the Lotus# library. Making as resolved.


In reply to: 219921638 [](ancestors = 219921638)

}

[Fact]
public void OnnxStatic()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this one, but I would love to have test which would ran ML.Net-> Onnx conversion and then score that model, and compare results. #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's a great Uber-test to have. We can probably track that as a separate feature or task (since it's using 2 diff transforms). Let me know if we can close this for now.


In reply to: 219967750 [](ancestors = 219967750)


<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<IncludeInPackage>Microsoft.ML.TensorFlow</IncludeInPackage>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft.ML.TensorFlow [](start = 22, length = 23)

Microsoft.ML.OnnxTransform #Closed

try
{
pipe.Fit(invalidDataWrongVectorSize);
//Assert.False(true);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//Assert.False(true); [](start = 16, length = 21)

why this is commented? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommented now.


In reply to: 220398186 [](ancestors = 220398186)

var invalidDataWrongTypes = ComponentCreation.CreateDataView(Env, stringData);
var invalidDataWrongVectorSize = ComponentCreation.CreateDataView(Env, sizeData);
TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongNames);
//TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongTypes);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongTypes); [](start = 12, length = 73)

? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommented


In reply to: 220398217 [](ancestors = 220398217)

{
using (var env = new ConsoleEnvironment())
{
Assert.Equal(Maml.Main(new[] { @"showschema loader=Text{col=a:R4:0-3 col=b:R4:0-3} xf=OnnxTransform{inputs=a inputs=b outputs=c model={model_matmul/frozen_saved_model.pb}} in=f:\2.txt" }), (int)0);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs=a inputs=b outputs=c [](start = 116, length = 27)

in your argument class you have this:
public string InputColumn;
public string OutputColumn

you accept only one column as input and one as output.
also names are different. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please see upated.


In reply to: 220398640 [](ancestors = 220398640)

@@ -17,6 +17,7 @@
<ProjectReference Include="..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
<ProjectReference Include="..\..\src\Microsoft.ML.Onnx\Microsoft.ML.Onnx.csproj" />
<ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow\Microsoft.ML.TensorFlow.csproj" />
<ProjectReference Include="..\..\src\Microsoft.ML.OnnxTransform\Microsoft.ML.OnnxTransform.csproj" />
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert whole file. #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Done


In reply to: 220398733 [](ancestors = 220398733)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have package reference on Onnx.TestModels.


In reply to: 220411170 [](ancestors = 220411170,220398733)


resultDic[Transformer.Output] = new SchemaShape.Column(Transformer.Output,
Transformer.OutputType.IsKnownSizeVector ? SchemaShape.Column.VectorKind.Vector
: SchemaShape.Column.VectorKind.VariableVector, Transformer.OutputType.ItemType, false);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transformer.OutputType.ItemType [](start = 64, length = 31)

Either modify OnnxUtils.CopyTo to support other types rather than float, or put NumberType.R4 here. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In reply to: 220400133 [](ancestors = 220400133)

_host.AssertValue(input);
_host.Assert(typeof(T) == _outputItemRawType);

ValueGetter<VBuffer<T>> valuegetter = (ref VBuffer<T> dst) =>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valuegetter [](start = 40, length = 11)

camelCase please #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In reply to: 220400214 [](ancestors = 220400214)

<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<IncludeInPackage>Microsoft.ML.OnnxTransform</IncludeInPackage>
<DefineConstants>CORECLR</DefineConstants>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is unnecessary #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


In reply to: 220400768 [](ancestors = 220400768)

<description>The name of each output column should match one of the operations in the Tensorflow graph.</description>
</item>
<item>
<description>Currently, float and double are the only acceptable data types for input/output.</description>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double [](start = 46, length = 6)

You throw exception in case of double type as output column. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this to only 'float'.


In reply to: 220400879 [](ancestors = 220400879)

</list>

The inputs and outputs of a TensorFlow model can be obtained using the <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/README.md#inspecting-graphs">
<code>summarize_graph</code> tool
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be onnx specific or removed. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with netron tool.


In reply to: 220400976 [](ancestors = 220400976)

@jignparm jignparm changed the title [WIP] Add OnnxTransform for scoring Onnx 1.2 models - integrates Microsoft.ML.Scoring/Sonoma Library Add OnnxTransform for scoring Onnx 1.2 models - integrates Microsoft.ML.Scoring/Sonoma Library Sep 26, 2018
@@ -0,0 +1,404 @@
using Microsoft.ML.Runtime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: need header

@@ -38,5 +38,6 @@
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.ML.TensorFlow.TestModels" Version="0.0.3-test" />
<PackageReference Include="Microsoft.ML.Onnx.TestModels" Version="0.0.2-test" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 4, length = 80)

You don't need this line anymore.

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shauheen shauheen merged commit f6d850f into dotnet:master Sep 26, 2018
@helloguo
Copy link

Is Microsoft.ML.Scoring going to be a part of ML.NET and open source?

@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants