-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OnnxTransform for scoring Onnx 1.2 models - integrates Microsoft.ML.Scoring/Sonoma Library #942
Conversation
|
||
<ItemGroup> | ||
<ProjectReference Include="../Microsoft.ML/Microsoft.ML.nupkgproj" /> | ||
<PackageReference Include="Microsoft.ML.Scoring" Version="1.0.4-dev47509" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0.4-dev47509 [](start = 62, length = 14)
please put it to https://github.com/dotnet/machinelearning/blob/master/build/Dependencies.props #Closed
internal const string LoaderSignature = "OnnxTransform"; | ||
|
||
public readonly string[] Inputs; | ||
public readonly string[] Outputs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to support multiple input /outputs or only one?
It looks weird, you constantly cast single objects to arrays and back. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's single input/output transform. I cleaned all the casts.
In reply to: 219612560 [](ancestors = 219612560)
{ | ||
public sealed class OnnxTransform : ITransformer, ICanSaveModel | ||
{ | ||
public sealed class Arguments : TransformInputBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments [](start = 28, length = 9)
So far we have tendency to use Arguments class for entry points and command line, and to have separate class ColumnInfo with same functionality, but without all attributes.
And in our constructors we tend to use ColumnInfo class (except constructor for SignatureLoadDataTransform) #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was basing this on TensorFlowTransform which only has Arguments class. I looked at couple other transforms (MutualInformationFeatureSelectionTransform, GroupTransform) and they have Arguments only. Could you please point me to a sample transform that follows above? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Adding Shahab's response]
I was basing this on TensorFlowTransform which only has Arguments class. I looked at couple other transforms (MutualInformationFeatureSelectionTransform, GroupTransform) and they have Arguments only. Could you please point me to a sample transform that follows above?
In reply to: 219613034 [](ancestors = 219613034)
} | ||
|
||
// Factory method for SignatureLoadDataTransform | ||
public static IDataTransform Create(IHostEnvironment env, ModelLoadContext ctx, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
can make private #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't some of the Create(...) need to be public, b/c they get called from another assembly? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some, yes, probably one which returns IDataTransform. All other methods, just for our dependency framework, and can be private.
In reply to: 220260318 [](ancestors = 220260318)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks Ivan. We've made them private for now -- all unit tests are passing so far.
In reply to: 220309543 [](ancestors = 220309543,220260318)
ctx.SaveNonEmptyString(_args.OutputColumn); | ||
} | ||
|
||
internal sealed class Mapper : IRowMapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
why not private? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made private #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// After adaptation, you'd call GetTensor() on the IdvToTensorAdapter object to get the Tensor equivalent of | ||
/// each row. | ||
/// </summary> | ||
internal sealed class IdvToTensorAdapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IdvToTensorAdapter [](start = 30, length = 18)
I would move it to OnnxUtils #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to OnnxUtils #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Argument(ArgumentType.Required, HelpText = "Path to the onnx model file.", ShortName = "model", SortOrder = 0)] | ||
public string ModelFile; | ||
|
||
[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "TBD", SortOrder = 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBD [](start = 81, length = 3)
nit: needs to be updated #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public static OnnxModel CreateFromBytes(byte[] modelBytes) | ||
{ | ||
var tempModelDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tempModelDir [](start = 16, length = 12)
when does this tempModelDir get cleaned up ?
public static OnnxModel CreateFromBytes(byte[] modelBytes) | ||
{ | ||
var tempModelDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString()); | ||
Directory.CreateDirectory(tempModelDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tempModelDir [](start = 38, length = 12)
might consider ACL this temp dir .
Temp directories with models are considered executable code and considered security threat by .NET security team
</para> | ||
|
||
<para> | ||
This transform requires the <a href="https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML.TensorFlow/0.5.0-preview-26830-5">Microsoft.ML.TensorFlow</a> nuget to be installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft.ML.TensorFlow [](start = 150, length = 23)
is this a requirement for the OnnxTransform ? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jignesh, this should be Sonoma nuget, right? Could you please update it. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire doc.xml has been modified. No reference to TF anywhere now.
In reply to: 219920734 [](ancestors = 219920734)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this doc has been rewritten. The version being commented on was obselete.
In reply to: 220279203 [](ancestors = 220279203)
{ | ||
_host.CheckNonWhiteSpace(args.ModelFile, nameof(args.ModelFile)); | ||
_host.CheckUserArg(File.Exists(args.ModelFile), nameof(args.ModelFile)); | ||
_model = new OnnxModel(args.ModelFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_model [](start = 16, length = 6)
is there a memory limitation of keeping everything in memory ? i would presume some Onnx models to be quite large
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is what is supported in Sonoma currently. There's no alternative at this point. This can be a design discussion for the Lotus# library. Making as resolved.
In reply to: 219921638 [](ancestors = 219921638)
…machinelearning into jignparm/onnxtransform
} | ||
|
||
[Fact] | ||
public void OnnxStatic() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this one, but I would love to have test which would ran ML.Net-> Onnx conversion and then score that model, and compare results. #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's a great Uber-test to have. We can probably track that as a separate feature or task (since it's using 2 diff transforms). Let me know if we can close this for now.
In reply to: 219967750 [](ancestors = 219967750)
|
||
<PropertyGroup> | ||
<TargetFramework>netstandard2.0</TargetFramework> | ||
<IncludeInPackage>Microsoft.ML.TensorFlow</IncludeInPackage> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft.ML.TensorFlow [](start = 22, length = 23)
Microsoft.ML.OnnxTransform #Closed
try | ||
{ | ||
pipe.Fit(invalidDataWrongVectorSize); | ||
//Assert.False(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//Assert.False(true); [](start = 16, length = 21)
why this is commented? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var invalidDataWrongTypes = ComponentCreation.CreateDataView(Env, stringData); | ||
var invalidDataWrongVectorSize = ComponentCreation.CreateDataView(Env, sizeData); | ||
TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongNames); | ||
//TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongTypes); [](start = 12, length = 73)
? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
using (var env = new ConsoleEnvironment()) | ||
{ | ||
Assert.Equal(Maml.Main(new[] { @"showschema loader=Text{col=a:R4:0-3 col=b:R4:0-3} xf=OnnxTransform{inputs=a inputs=b outputs=c model={model_matmul/frozen_saved_model.pb}} in=f:\2.txt" }), (int)0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs=a inputs=b outputs=c [](start = 116, length = 27)
in your argument class you have this:
public string InputColumn;
public string OutputColumn
you accept only one column as input and one as output.
also names are different. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -17,6 +17,7 @@ | |||
<ProjectReference Include="..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.Onnx\Microsoft.ML.Onnx.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow\Microsoft.ML.TensorFlow.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.OnnxTransform\Microsoft.ML.OnnxTransform.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert whole file. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have package reference on Onnx.TestModels.
In reply to: 220411170 [](ancestors = 220411170,220398733)
|
||
resultDic[Transformer.Output] = new SchemaShape.Column(Transformer.Output, | ||
Transformer.OutputType.IsKnownSizeVector ? SchemaShape.Column.VectorKind.Vector | ||
: SchemaShape.Column.VectorKind.VariableVector, Transformer.OutputType.ItemType, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transformer.OutputType.ItemType [](start = 64, length = 31)
Either modify OnnxUtils.CopyTo to support other types rather than float, or put NumberType.R4 here. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_host.AssertValue(input); | ||
_host.Assert(typeof(T) == _outputItemRawType); | ||
|
||
ValueGetter<VBuffer<T>> valuegetter = (ref VBuffer<T> dst) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valuegetter [](start = 40, length = 11)
camelCase please #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<PropertyGroup> | ||
<TargetFramework>netstandard2.0</TargetFramework> | ||
<IncludeInPackage>Microsoft.ML.OnnxTransform</IncludeInPackage> | ||
<DefineConstants>CORECLR</DefineConstants> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is unnecessary #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<description>The name of each output column should match one of the operations in the Tensorflow graph.</description> | ||
</item> | ||
<item> | ||
<description>Currently, float and double are the only acceptable data types for input/output.</description> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double [](start = 46, length = 6)
You throw exception in case of double type as output column. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</list> | ||
|
||
The inputs and outputs of a TensorFlow model can be obtained using the <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/README.md#inspecting-graphs"> | ||
<code>summarize_graph</code> tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be onnx specific or removed. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,404 @@ | |||
using Microsoft.ML.Runtime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: need header
@@ -38,5 +38,6 @@ | |||
</ItemGroup> | |||
<ItemGroup> | |||
<PackageReference Include="Microsoft.ML.TensorFlow.TestModels" Version="0.0.3-test" /> | |||
<PackageReference Include="Microsoft.ML.Onnx.TestModels" Version="0.0.2-test" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 4, length = 80)
You don't need this line anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Microsoft.ML.Scoring going to be a part of ML.NET and open source? |
Fixes issue #695
Fixes issue #892
This adds a new transform for scoring Onnx v1.2 models, leveraging an updated version of the scoring library at the link below.
https://www.nuget.org/packages/Microsoft.ML.Scoring/