-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML.Net and Tensorflow integration demo. #3
base: master
Are you sure you want to change the base?
Conversation
@@ -6,6 +6,10 @@ | |||
<DefineConstants>CORECLR</DefineConstants> | |||
</PropertyGroup> | |||
|
|||
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|AnyCPU'"> | |||
<NoWarn>CS8002;1591</NoWarn> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that it isn't strong named. That will cause issues on desktop. We could ask for it to be strong named, or build our own copy.
Also this can be
<NoWarn>$(NoWarn);CS8002</NoWarn>
And placed in the main PropertyGroup @ line 7. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// Tensorflow session object | ||
/// </summary> | ||
private readonly TFSession _session; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_session [](start = 35, length = 8)
Is TFSession thread-safe? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFSeesion and Graph seems to be threadsafe. I will look into Run method in python tensorflow code to see how they are ensuring thread safety.
In reply to: 209305362 [](ancestors = 209305362)
|
||
return (ref VBuffer<float> dst) => | ||
{ | ||
var runner = _session.GetRunner(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runner [](start = 24, length = 6)
Can you have multiple runners at the same time from the same session?
It feels to me that the runner should be external to the getter delegate, and reused between calls. If this can not be done, explain why. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we can not have single runner mostly because of implementation of Runner
which currently does not allow to set the input value once it has been set (using AddInput
method). This is the limitation of TensorflowSharp
.
In reply to: 209305767 [](ancestors = 209305767)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will still need to think of what functionality can be extracted out of the delegate.
For example, input.Schema.GetColumnName(info.SrcIndices[i])
is not going to change from call to call
In reply to: 209413407 [](ancestors = 209413407,209305767)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can and should have multiple Runners, the Runner isn't really a heavy object, it's just a holder for the state of a run. As such it doesn't make much sense to try to share it between runs.
In reply to: 209663760 [](ancestors = 209663760,209413407,209305767)
|
||
public override void Save(ModelSaveContext ctx) | ||
{ | ||
throw new NotImplementedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw [](start = 12, length = 5)
Have you given some thought on how we are planning to save/load such transforms? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into it. I haven't thought about it yet. Maybe I will look into CNTK implementation in TLC to see what they are doing.
In reply to: 209306191 [](ancestors = 209306191)
var status = new TFStatus(); | ||
|
||
return new TFSession().FromSavedModel( | ||
sessionOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to remove the TF model files after the session has read them? Or does the session defer reading till some later time? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, session loads the file immediately upon calling this method.
In reply to: 209306586 [](ancestors = 209306586)
b = new[] { 3.0f, 3.0f, | ||
3.0f, 3.0f } } })); | ||
|
||
var trans = new TensorflowTransform(env, loader, model_location, "c", "a", "b"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hope that we don't need to explicitly declare this but can instead read it from model. We should try to auto-map the IDV column names to TF input names. Only when we can't map them should we require some user input. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this will completely preclude any compile-time validation.
In reply to: 209317833 [](ancestors = 209317833)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how listing strings without type information provides compile time validation. Sure if the pigsty tuples are used then I'd expect those to show up and just flow through the pipeline and work assuming remappings don't need to happen. I'd only expect a caller to explicitly list column names here if they need to change them from tuples that have already been defined in the data.
In reply to: 209324258 [](ancestors = 209324258,209317833)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is, if you make transform work with no parameters whatsoever, and ONLY transform the columns it finds in the model, then Pigsty has nothing to 'attach to'.
In reply to: 209329418 [](ancestors = 209329418,209324258,209317833)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think you're right, one doesn't preclude the other.
In reply to: 209374964 [](ancestors = 209374964,209329418,209324258,209317833)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric, it could be but its not straight forward. Because there could be many variables in the tensorflow model and its difficult to identify which ones are inputs and need values to be fed.
I will investigate it further if its possible. Because when saving model in tensorflow for serving, it ask for input and ouput. Let see if TFSharp has any convience to get it.
In reply to: 209375203 [](ancestors = 209375203,209374964,209329418,209324258,209317833)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sonoma does this FYI, you can look at how they did it.
In reply to: 209413718 [](ancestors = 209413718,209375203,209374964,209329418,209324258,209317833)
var result = new float[size]; | ||
Marshal.Copy(data, result, 0, size); | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying output buffer from TFTensor
to managed memory. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK as a temporary hack, put in a big comment here (or some asserts) about how this only works for row-major float tensors.
In reply to: 209413803 [](ancestors = 209413803)
using TensorFlow; | ||
|
||
[assembly: LoadableClass(TensorflowTransform.Summary, typeof(TensorflowTransform), typeof(TensorflowTransform.Arguments), typeof(TensorflowTransform), | ||
TensorflowTransform.UserName, "CopyColumns", "CopyColumnsTransform", TensorflowTransform.ShortName, DocName = "transform/CopyColumnsTransform.md")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CopyColumns [](start = 35, length = 11)
by the way, you need to fix the attribute #Resolved
|
||
namespace Microsoft.ML.Transforms | ||
{ | ||
public sealed class TensorflowTransform : RowToRowMapperTransformBase, IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDisposable [](start = 75, length = 11)
That is incorrect. Our transforms should never be disposable. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at DnnImageFeaturizerTransform and it just lets GC finalize disposable CNTK.Function. Also checked CntkPredictorBase and it does the same for its public Function Model property.
In reply to: 209664938 [](ancestors = 209664938)
@YaelD is added to the review. #Closed |
@yaeldekel is added to the review. #Closed |
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253, 190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190, 253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221, 253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialized the TF model in ML.NET which enables us to use the pipeline for scoring and prediction now...:blush: #Resolved
{ | ||
dims[k] = (int)(shape[k] == -1 ? batchSize : shape[k]); | ||
} | ||
// Output tensor is expected to be of shape [Batch, Data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Output tensor is expected to be of shape [Batch, Data] [](start = 16, length = 57)
Not any more. I will remove the comment. #Resolved
/// <summary> | ||
/// Tensorflow session object | ||
/// </summary> | ||
private TFSession _session; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_session [](start = 26, length = 8)
Need to find out a way to dispose this safely. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of implementing IDisposable, you can dispose of this resource in the finalizer of the transform.
In reply to: 210042898 [](ancestors = 210042898)
ctx.SaveBinaryStream("TFModel", w => | ||
{ | ||
w.WriteByteArray(buffer.ToArray()); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for success here. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToGraphDef might throw exceptions, but otherwise, could this not succeed?
In reply to: 210043097 [](ancestors = 210043097)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me comment shows at different line number after the several commits. Originally, I commented while loading at following code.
byte[] data = null;
ctx.TryLoadBinaryStream("TFModel", r =>
{
data = r.ReadByteArray();
});
In reply to: 210651836 [](ancestors = 210651836,210043097)
|
||
if (tfType == TFDataType.Int32) | ||
{ | ||
return CreateTensorValueGetter<DvInt4>(input, type, colIndex, tfShape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DvInt4 [](start = 47, length = 6)
ML.Net types cannot be passed to TF right now. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is work going on on removing ML.Net types and using pure .net types. That will resolve this issue.
In reply to: 210172882 [](ancestors = 210172882)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as this work is not done yet, we should either not support these types, or use the ML.NET types and convert to .net types before passing to TF. this method gets the getters of the inputs, and will throw exception if called with Int64, bool or string.
In reply to: 210456041 [](ancestors = 210456041,210172882)
{ | ||
var scalar = default(T); | ||
_srcgetter(ref scalar); | ||
return (TFTensor)_ctor.Invoke(null, new object[] { scalar }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(TFTensor)_ctor.Invoke(null, new object[] { scalar }); [](start = 23, length = 54)
This reflection call is not intended here. We need a method in TFSharp where we can pass in generic data to TF. Currently, there are methods in TFSharp to do that but those are not exposed. I and @eric will work on that. #Resolved
_srcgetter(ref tmpBuf); | ||
var dense = default(VBuffer<T>); | ||
tmpBuf.CopyToDense(ref dense); | ||
return (TFTensor)_method.Invoke(null, new object[] { _tfShape, dense.Values, 0, dense.Length }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (TFTensor)_method.Invoke(null, new object[] { _tfShape, dense.Values, 0, dense.Length }); [](start = 16, length = 96)
This reflection call is not intended here. We need a method in TFSharp where we can pass in generic data to TF. Currently, there are methods in TFSharp to do that but those are not exposed. I and @eric will work on that. #Resolved
This is just a stub to begin iterating on. To follow: 1. Pull over a copy of TF# source 2. Tree-shake unused code. 3. Rewrite / Rename TF# API as appropriate. 4. Come up with a strategy for providing TF bins (using TF#'s copy now, merely as a starting point).
*NOT BUILDING*
I've fixed some minor rule issues and disabled the naming analyzers since those churn the source too much.
For now it is BYOTF (bring your own TensorFlow) so the tests are grabbing the TF bits for TF#.
|
||
if (tfType == TFDataType.Int64) | ||
{ | ||
return Utils.MarshalInvoke(MakeGetter<Int64>, outInfo.ItemType.RawType, input, iinfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int64 [](start = 54, length = 5)
Do you have any tests testing this (and int, bool and string)? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet because ML.Net does not support dotnet types it uses DvInt etc. Once ZeeshanS changes are there I will be able to test.
In reply to: 210684078 [](ancestors = 210684078)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know when Zeeshan's changes will be done, but for now I don't think this would work. If you don't want to handle these types using DvInt,DvBool etc., then I think we should not support them until the type system changes go in.
In reply to: 210718189 [](ancestors = 210718189,210684078)
{ | ||
var inputName = tfInfo.InputColNames[i]; | ||
var type = info.SrcTypes[i]; | ||
runner.AddInput(inputName, srcTensorGetters[i].GetTensor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetTensor [](start = 71, length = 9)
Is there a way to make this method not allocate a new array on every call? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we need to keep track of the allocations and dispose of them by overriding the RowCursor's Dispose() method (IRowCursors are IDisposable).
In reply to: 210685450 [](ancestors = 210685450)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wont allocate new array. It takes the handle of the .net array and pass it on to TF.
In reply to: 210688784 [](ancestors = 210688784,210685450)
runner.AddInput(inputName, srcTensorGetters[i].GetTensor()); | ||
} | ||
|
||
var tensors = runner.Fetch(_bindings.OutputColNames[iinfo]).Run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run [](start = 80, length = 3)
Same question for the outputs. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For output, its unmanage array that is returned and we make a copy of it after this line.
In reply to: 210685620 [](ancestors = 210685620)
|
||
Contracts.Assert(tensors.Length > 0); | ||
|
||
var output = FetchData<T>(tensors[0].Data, _bindings.OutputCols[iinfo].VectorSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FetchData [](start = 33, length = 9)
FetchData should use the values array of dst instead of creating a new one (you can check to see the common pattern in other transforms: if the length of dst.Values is not sufficient, we create a new array.) #Resolved
return result; | ||
} | ||
|
||
private Delegate MakeGetter(IRow input, int iinfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeGetter [](start = 25, length = 10)
Same comment as for CreateTensorValueGetterVec. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runner.AddInput(inputName, srcTensorGetters[i].GetTensor()); | ||
} | ||
|
||
var tensors = runner.Fetch(_bindings.OutputColNames[iinfo]).Run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensors [](start = 24, length = 7)
Does TF know how to output values from multiple layers in one call? If this is the case, we should not make this transform ManyToOne, it should be many-to-many. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can get multiple outputs in one call. I am not sure right now how to do it in ML.Net. This is the next thing I am going to focus on.
In reply to: 210687722 [](ancestors = 210687722)
{ | ||
if (!activeInfos(iinfo)) | ||
continue; | ||
getters[iinfo] = MakeGetter(input, iinfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeGetter [](start = 37, length = 10)
Add an out parameter of a disposer to MakeGetter, this way it can be used here, and by the RowCursor. #Resolved
|
||
namespace Microsoft.ML.Transforms.TensorFlow | ||
{ | ||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 1)
Change the tabs to spaces. (in the other files from TFSharp as well). #Resolved
|
||
private static DataKind Tf2MlNetType(TFDataType type) | ||
{ | ||
if(type == TFDataType.Float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [](start = 16, length = 2)
Can this be changed to a switch statement? #Resolved
|
||
if (tfType == TFDataType.String) | ||
{ | ||
return Utils.MarshalInvoke(MakeGetter<string>, outInfo.ItemType.RawType, input, iinfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string [](start = 54, length = 6)
This should be DvText. #Resolved
|
||
if (tfType == TFDataType.String) | ||
{ | ||
return CreateTensorValueGetter<string>(input, type, colIndex, tfShape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string [](start = 47, length = 6)
Text will continue to be handled using DvText even after we move to .net types. #Resolved
/// <summary> | ||
/// Creates a tensor representing type T[]. | ||
/// T[] will be pinned and wrapped in a tensor. | ||
/// This method does not support jagged arrays since those require copies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does not support jagged arrays since those require copies. [](start = 12, length = 70)
This comment is no longer relevant since you only accept single dimensional arrays. #Resolved
/// </summary> | ||
/// <typeparam name="T">.NET type of tensor to create</typeparam> | ||
/// <param name="data">value of tensor</param> | ||
public static TFTensor Create<T>(T data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create [](start = 31, length = 6)
Consider changing this to createscalar. #Resolved
|
||
internal static TFDataType MlNet2TfType(Type type) | ||
{ | ||
if (type == typeof(float)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( [](start = 30, length = 1)
This could be code-gen'ed by the TT file. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there is already a method in TFSharp. I copied it.
In reply to: 210747970 [](ancestors = 210747970,210747481)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still plan on deleting all this TF# code and code-gen'ing it. It's all boiler-plate copy/paste that can be driven off the type mapping.
In reply to: 210758078 [](ancestors = 210758078,210747970,210747481)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -103,8 +103,7 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.CpuMath.UnitTe | |||
EndProject | |||
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.CpuMath.UnitTests.netcoreapp", "test\Microsoft.ML.CpuMath.UnitTests.netcoreapp\Microsoft.ML.CpuMath.UnitTests.netcoreapp.csproj", "{5F81A2A4-73AD-494C-B387-07D605EC8826}" | |||
EndProject | |||
|
|||
Project("{F2A71F9B-5D33-465A-A702-920D77279786}") = "Microsoft.ML.FSharp.Tests", "test\Microsoft.ML.FSharp.Tests\Microsoft.ML.FSharp.Tests.fsproj", "{802233D6-8CC0-46AD-9F23-FEE1E9AED9B3}" | |||
Project("{6EC3EE1D-3C4E-46DD-8F32-0CC8E7565705}") = "Microsoft.ML.FSharp.Tests", "test\Microsoft.ML.FSharp.Tests\Microsoft.ML.FSharp.Tests.fsproj", "{802233D6-8CC0-46AD-9F23-FEE1E9AED9B3}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6EC3EE1D-3C4E-46DD-8F32-0CC8E7565705 [](start = 10, length = 36)
Why did this change?
namespace Microsoft.ML.Transforms.TensorFlow | ||
{ | ||
|
||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 1)
Convert tabs to spaces.
|
||
namespace Microsoft.ML.Transforms.TensorFlow | ||
{ | ||
internal static partial class NativeBinding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 1)
Convert tabs to spaces
_bindings.Save(ctx); | ||
} | ||
|
||
public void DisposeTFSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DisposeTFSession [](start = 20, length = 16)
I think this method can be removed.
{ | ||
public sealed class TensorflowTransform : RowToRowMapperTransformBase | ||
{ | ||
public sealed class Column : ManyToOneColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ManyToOneColumn [](start = 37, length = 15)
Consider not using this base class (or the ManyToOneColumnBindingsBase class). These are intended for applying the transform simultaneously to multiple sets of input columns. It is advantageous in case the transform requires training (such as ngram transform) because defining multiple transforms would require multiple passes over the data.
} | ||
} | ||
|
||
public sealed class Arguments : TransformInputBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments [](start = 28, length = 9)
I would change the structure of these arguments to be an array of strings for the inputs, and a string for the output.
Later we may want to change the arguments to have a mapping from the IDV column names to the model input names, and/or from the model output names to IDV output column names.
public readonly ColumnType[] OutputCols; | ||
public readonly TFDataType[] OutputTFTypes; | ||
|
||
public Bindings(Column[] columns, ISchema schemaInput, TensorflowTransform parent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bindings [](start = 19, length = 8)
Change this class to have just one private ctor, and two public ctors/create methods that call the private one. This way if we make any changes to this class we won't need to remember to make them in two places.
_session.DeleteSession(); | ||
} | ||
|
||
private ValueGetter<T> GetSrcGetter<T>(IRow input, int iinfo, int isrc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetSrcGetter [](start = 31, length = 12)
Is this used anywhere?
private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, ColumnType columnType, int colIndex, TFShape tfShape) | ||
{ | ||
var type = TFTensor.TypeFromTensorType(tfType); | ||
if (type != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type [](start = 16, length = 4)
We should be able to assert here that type != null.
return new TensorValueGetter<T>(input, colIndex); | ||
} | ||
|
||
private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, ColumnType columnType, int colIndex, TFShape tfShape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vec [](start = 58, length = 3)
Can the "Vec" suffix be removed? I don't see a method that is for "NotVec".
Contracts.Assert(tensors.Length > 0); | ||
|
||
var values = dst.Values; | ||
if (Utils.Size(values) != _bindings.OutputCols[iinfo].VectorSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!= [](start = 43, length = 2)
Doesn't need to be exactly equal. < is enough here.
values = new T[_bindings.OutputCols[iinfo].VectorSize]; | ||
|
||
TensorflowUtils.FetchData<T>(tensors[0].Data, values); | ||
dst = new VBuffer<T>(values.Length, values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VBuffer [](start = 30, length = 7)
Pass dst.Indices to the new VBuffer as well.
* Added placeholder * Cleaned up Infos (replaced with ColumnPairs) * Added ColumnInfo * Added all the Create() methods. * Added Mapper * Commented out the EntryPoint * Added PcaEstimator2 * PcaWorkout test passes * Added pigsty api * Fixed EntryPoint * Fixed the arguments * Fixed tests and added pigsty test * Deleted Wrapped PCA transform * Float -> float * Cleaned docstrings * Removed some unnecessary checks * Simplified unnecessary code * Moved some fields to ColumnInfo for simplifications * Simplified weight columns * Address PR comments #1 * Addressed PR comments #2 * Moved the static test * PR comments #3 * Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo. * PR comments * PR comments * Updated manifest for entrypoint PcaCalculator * Fixed schema exceptions
In this demo, I created a
TensorflowTransform
that takes following as input and creates column based on output name provided. The output column can be used as features for further training in the ML.Net pipeline.