Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide IDataTransform Create methods #2031

Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

Step towards #1995.
Also add proper comments for Image transforms.
Remove Create methods in Onnx and Tensorflow (replace them with proper constructors).


namespace Microsoft.ML.OnnxTransform.StaticPipe
namespace Microsoft.ML.Transforms.StaticPipe
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transforms [](start = 23, length = 10)

Reason why I make this change is because I don't like what I have to do this:
new Transforms.OnnxTransform(...);
instead of just
new OnnxTransform(...);
if we have Microsoft.ML in usings. (see OnnxTransformTests.cs)

Shall we just call it Microsoft.ML.StaticPipe without any additional sub namespaces? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that static pipe would be something we'd nail down later, but I'd expect that inputs to to staticpipe would just be accomplished by taking a PiplineColumn instance, and applying the transform to it. There should not be any new Transforms.OnnxTransform in the static pipeline at all, since the static pipeline is more declarative. So I don't like either of the things you've described, if indeed that is now it iss set up. If that's not how it was set up, well, that's bad. :) So that would be something to file and issue on and fix. (Though I think static pipeline is something we're going to refine more post v1.0.)


In reply to: 245456891 [](ancestors = 245456891)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, from what I see below, it is set up as an extension method on the columns. So what is the problem?


In reply to: 245458003 [](ancestors = 245458003,245456891)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have your code and you use using Microsoft.ML; in it (which we do all the time), I can't longer just create OnnxTransform object via new OnnxTransform(...), because you will get collision of namespace and object.
So you have to directly specify what your transform is in Transform namespace like this: Transforms.OnnxTransform which I found annoying.


In reply to: 245458074 [](ancestors = 245458074,245458003,245456891)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Yes I agree then, I didn't know you were talking about namespace/object collision.

Ultimately all static extensions should be in Microsoft.ML namespace (since we must accept that inclusion of a specific nuget is license to use it, similar to our attitude about MLContext extensions), but not right now.


In reply to: 245458427 [](ancestors = 245458427,245458074,245458003,245456891)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that happens only when ppl use both static and dynamic, right? Our tests do so, but idk how many ppl will mix ant match static and dynamic.


In reply to: 245458713 [](ancestors = 245458713,245458427,245458074,245458003,245456891)

/// <summary>
/// Factory method for SignatureLoadModel.
/// </summary>
// Factory method for SignatureLoadModel.
Copy link
Contributor

@TomFinley TomFinley Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Factory method for SignatureLoadModel. [](start = 8, length = 41)

I see you doing this multiple places, is there any reason for it?

If we instead kept it an XML comment, we could improve this by having a <see tag on it, which overall seems like a good idea. We're not doing this now, but it seems like an unforced regression into a worse, not a better, place. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only reason why we had

tag because it was requirement for public method, and since it's no longer public I just prefer mark this methods as comments since
a) This is most common pattern in our codebase.
b) I think about this as more as internal message for developer: "Hey, don't delete this one, even if no one uses it, we actually call it via our DI system." Summary is more like a description of what is going on in this method, and this is more like a mark/internal comment.


In reply to: 245456969 [](ancestors = 245456969)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags are a requirement for public methods. That ***does not make them a bad thing for private or internal methods***, especially if those comments explicitly reference other structures in the codebase, as this does.

Why would a developer prefer this comment, to the old comment? If you had instead had a <see tag, as suggested above, that developer could even F12 on it to see what the heck we're talking about. Right?


In reply to: 245457620 [](ancestors = 245457620,245456969)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just be a simple search-replace, I think, to put it into proper form. At least, so I'd think.


In reply to: 245458497 [](ancestors = 245458497,245457620,245456969)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I can't win argument with you so I reverted my changes to summary comments. Although I'm still adding comments for other places since this is common pattern in our code.
I hope this is not a blocker, since main intent of this PR is to hide IDataTransforms methods.


In reply to: 245459044 [](ancestors = 245459044,245458497,245457620,245456969)

/// </summary>
public static IDataTransform Create(IHostEnvironment env, MeanVarArguments args, IDataView input)
// Factory method corresponding to SignatureDataTransform.
internal static IDataTransform Create(IHostEnvironment env, MeanVarArguments args, IDataView input)
Copy link
Contributor

@TomFinley TomFinley Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

If the comment is correct, then it should suffice for it to be private, right? So is the comment incorrect? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everytime we have Entrypoint for transform I have to mark this method as internal.
As alternative I can refactor conversion of argument class to ColumnInfo array and use it here and in entrypoint, but oh, boy, it would be lot of changes across codebase.


In reply to: 245457063 [](ancestors = 245457063)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh entrypoints. Fair enough.


In reply to: 245457335 [](ancestors = 245457335,245457063)

@@ -15,7 +15,7 @@ public abstract class OneToOneTransformerBase : RowToRowTransformerBase
{
protected readonly (string input, string output)[] ColumnPairs;

protected OneToOneTransformerBase(IHost host, (string input, string output)[] columns) : base(host)
protected OneToOneTransformerBase(IHost host, params (string input, string output)[] columns) : base(host)
Copy link
Contributor

@TomFinley TomFinley Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params [](start = 54, length = 6)

Is this related? It might be a good idea, just not sure? It doesn't at least at first glance seem to have anything to do with the issue. (Not suggesting we can't do it, just, not sure what the purpose of the change is.) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I replaced one of Create methods with constructor which accept only one input and output column, and I didn't like idea what I need to cast that pair into array of pairs.
This is why I changed it. Not sure that constructor is still in part of codereview, but I don't think it's in general bad thing to have.


In reply to: 245457186 [](ancestors = 245457186)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, thanks Ivan.


In reply to: 245457798 [](ancestors = 245457798,245457186)

@@ -100,32 +100,8 @@ private static VersionInfo GetVersionInfo()
loaderAssemblyName: typeof(OnnxTransform).Assembly.FullName);
}

public static IDataTransform Create(IHostEnvironment env, IDataView input, string modelFile, int? gpuDeviceId = null, bool fallbackToCpu = false)
Copy link
Contributor

@TomFinley TomFinley Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static IDataTransform Create(IHostEnvironment env, IDataView input, string modelFile, int? gpuDeviceId = null, bool fallbackToCpu = false) [](start = 8, length = 145)

This is an "odd" method, so it seems good to remove, so are we instead working via the extension methods and MLContext? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about extension methods, but in tests I switch to using OnnxTransform() constructor with same parameters and then calling Transform method on DataView.
I think it's better to get rid of this Create methods, and use either estimator, or since estimator is trivial, just call directly transform consturctor.


In reply to: 245457717 [](ancestors = 245457717)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably.


In reply to: 245458012 [](ancestors = 245458012,245457717)

/// <param name="modelFile">Model file path.</param>
/// <param name="inputColumn">The name of the input data column. Must match model input name.</param>
/// <param name="outputColumn">The output columns to generate. Names must match model specifications. Data types are inferred from model.</param>
public TensorFlowTransform(IHostEnvironment env, string modelFile, string inputColumn, string outputColumn)
Copy link
Contributor

@TomFinley TomFinley Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorFlowTransform [](start = 15, length = 19)

The naming I find incomprehensible, similar to the ONNX thing. It is called a Transform, yet it appears to be a Transformer going by the class relationship? Do we know what is going on here?

If it is in fact a transform and not a transformer, as the name suggests, then I'd prefer our public surface area be smaller, not larger? But maybe the name is just bad? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a transformer now


In reply to: 245458201 [](ancestors = 245458201)

@TomFinley
Copy link
Contributor

TomFinley commented Jan 5, 2019

public sealed class OnnxTransform : RowToRowTransformerBase

This naming is very strange... it is called OnnxTransform, but, is it a IDataTransform, or an ITransformer? I know it's not your code, just wondering if we know what was the author's intent. #Resolved


Refers to: src/Microsoft.ML.OnnxTransform/OnnxTransform.cs:58 in 185099d. [](commit_id = 185099d, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor Author

public sealed class OnnxTransform : RowToRowTransformerBase

It's a ITransformer, I actually converted it to ITransformer some time ago, but didn't rename it.
I push another iteration with renaming for TensorFlow, Onnx, and Image transforms.


In reply to: 451613124 [](ancestors = 451613124)


Refers to: src/Microsoft.ML.OnnxTransform/OnnxTransform.cs:58 in 185099d. [](commit_id = 185099d, deletion_comment = False)

@@ -39,7 +39,7 @@ public sealed class ColumnCopyingEstimator : TrivialEstimator<ColumnCopyingTrans
{
}

public ColumnCopyingEstimator(IHostEnvironment env, params (string source, string name)[] columns)
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnCopyingEstimator [](start = 15, length = 22)

I'll bug the next person about XML comments, or do it:) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you trust me if I say what I plan to do that in other PR?
This one is purely IDataTransform hiding.


In reply to: 246154879 [](ancestors = 246154879)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!


In reply to: 246156509 [](ancestors = 246156509,246154879)

// Factory method to create from arguments
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input)
// Factory method corresponding to SignatureDataTransform.
internal static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input)
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

double-checking: can't be private? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's internal, it's mean we have entrypoint somewhere which invokes that create method.
I'm quite confident we have entrypoint for Column Copying.


In reply to: 246155060 [](ancestors = 246155060)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.


In reply to: 246156795 [](ancestors = 246156795,246155060)

/// <param name="env">The host environment.</param>
/// <param name="input">Name of the input column.</param>
/// <param name="output">Name of the column resulting from the transformation of <paramref name="input"/>.</param>
/// <param name="imageWidth">Widht of resized image.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Widht [](start = 37, length = 5)

width.

int imageWidth, int imageHeight, ImageResizerTransform.ResizingKind resizing = ImageResizerTransform.ResizingKind.IsoCrop, ImageResizerTransform.Anchor cropAnchor = ImageResizerTransform.Anchor.Center)
: this(env, new ImageResizerTransform(env, inputColumn, outputColumn, imageWidth, imageHeight, resizing, cropAnchor))
/// <summary>
/// Estimator which resize images.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resize [](start = 28, length = 6)

resizes

/// <param name="output">Name of the column resulting from the transformation of <paramref name="input"/>.</param>
/// <param name="imageWidth">Widht of resized image.</param>
/// <param name="imageHeight">Height of resized image.</param>
/// <param name="resizing">What resize method to use.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resize method [](start = 40, length = 13)

@@ -115,7 +116,7 @@ public bool TryUnparse(StringBuilder sb)
/// <param name="name">Name of the output column.</param>
/// <param name="source">Name of the column to be transformed. If this is null '<paramref name="name"/>' will be used.</param>
/// <param name="replaceWith">The replacement method to utilize.</param>
public static IDataTransform Create(IHostEnvironment env, IDataView input, string name, string source = null, ReplacementKind replaceWith = ReplacementKind.DefaultValue)
public static IDataView Create(IHostEnvironment env, IDataView input, string name, string source = null, ReplacementKind replaceWith = ReplacementKind.DefaultValue)
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

Internal? Actually, does this need a constructor? it is not a transformer, and not on the catalogs, but it will still pop up if public.

@@ -279,8 +281,10 @@ public TransformApplierParams(TextFeaturizingEstimator parent)
advancedSettings?.Invoke(AdvancedSettings);

_dictionary = null;
_wordFeatureExtractor = new NgramExtractorTransform.NgramExtractorArguments();
_charFeatureExtractor = new NgramExtractorTransform.NgramExtractorArguments() { NgramLength = 3, AllLengths = false };
if (AdvancedSettings.UseWordExtractor)
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (AdvancedSettings.UseWordExtractor) [](start = 11, length = 39)

Pr title should have been: Hide the Create methods, change the name of the ImageTransformers to match the conventions used elsewhere, and fix every bug, whitespace and whatnot i see as i work :D #WontFix

Seed = args.Seed,
Ordered = false,
Column = hashColumns.ToArray(),
InvertHash = args.InvertHash
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.InvertHash [](start = 33, length = 15)

are we losing this and the args.Seed? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 363: column.Seed ?? args.Seed
I use args.Seed


In reply to: 246170035 [](ancestors = 246170035)

}
}
}, data);
data = new ColumnConcatenatingTransformer(Env, "Features", new[] { "Features1", "Features2" }).Transform(data);
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data = new ColumnConcatenatingTransformer(Env, "Features", new[] { "Features1", "Features2" }).Transform(data); [](start = 16, length = 111)

me looking at this and thinking that we should make the Transformer constructors internal.

Is there any benefit to leaving them public? (i know we've had the conversation for the Predictors/ModelParams and that made sense.)

If it is more than a yes/no answer, i can create an issue and we can discuss it there. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think idea was if it's a trivial estimator, it's ok to have public constructor for it's transformer.
Estimator is become pointless for them unless you try to build chain of estimators.


In reply to: 246171719 [](ancestors = 246171719)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks.


In reply to: 246172551 [](ancestors = 246172551,246171719)

@@ -28,7 +28,7 @@ private class TestData
[ConditionalFact(typeof(Environment), nameof(Environment.Is64BitProcess))] // TensorFlow is 64-bit only
public void TensorFlowTransformMatrixMultiplicationTest()
{
var model_location = "model_matmul/frozen_saved_model.pb";
Copy link
Member

@sfilipi sfilipi Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_location [](start = 16, length = 14)

:) #WontFix

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ivanidzo4ka !

@Ivanidzo4ka Ivanidzo4ka merged commit 95f1530 into dotnet:master Jan 10, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants