-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hide IDataTransform Create methods #2031
Hide IDataTransform Create methods #2031
Conversation
|
||
namespace Microsoft.ML.OnnxTransform.StaticPipe | ||
namespace Microsoft.ML.Transforms.StaticPipe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transforms [](start = 23, length = 10)
Reason why I make this change is because I don't like what I have to do this:
new Transforms.OnnxTransform(...);
instead of just
new OnnxTransform(...);
if we have Microsoft.ML in usings. (see OnnxTransformTests.cs)
Shall we just call it Microsoft.ML.StaticPipe without any additional sub namespaces? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that static pipe would be something we'd nail down later, but I'd expect that inputs to to staticpipe would just be accomplished by taking a PiplineColumn
instance, and applying the transform to it. There should not be any new Transforms.OnnxTransform
in the static pipeline at all, since the static pipeline is more declarative. So I don't like either of the things you've described, if indeed that is now it iss set up. If that's not how it was set up, well, that's bad. :) So that would be something to file and issue on and fix. (Though I think static pipeline is something we're going to refine more post v1.0.)
In reply to: 245456891 [](ancestors = 245456891)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, from what I see below, it is set up as an extension method on the columns. So what is the problem?
In reply to: 245458003 [](ancestors = 245458003,245456891)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have your code and you use using Microsoft.ML;
in it (which we do all the time), I can't longer just create OnnxTransform object via new OnnxTransform(...)
, because you will get collision of namespace and object.
So you have to directly specify what your transform is in Transform namespace like this: Transforms.OnnxTransform
which I found annoying.
In reply to: 245458074 [](ancestors = 245458074,245458003,245456891)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Yes I agree then, I didn't know you were talking about namespace/object collision.
Ultimately all static extensions should be in Microsoft.ML
namespace (since we must accept that inclusion of a specific nuget is license to use it, similar to our attitude about MLContext
extensions), but not right now.
In reply to: 245458427 [](ancestors = 245458427,245458074,245458003,245456891)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that happens only when ppl use both static and dynamic, right? Our tests do so, but idk how many ppl will mix ant match static and dynamic.
In reply to: 245458713 [](ancestors = 245458713,245458427,245458074,245458003,245456891)
/// <summary> | ||
/// Factory method for SignatureLoadModel. | ||
/// </summary> | ||
// Factory method for SignatureLoadModel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Factory method for SignatureLoadModel. [](start = 8, length = 41)
I see you doing this multiple places, is there any reason for it?
If we instead kept it an XML comment, we could improve this by having a <see
tag on it, which overall seems like a good idea. We're not doing this now, but it seems like an unforced regression into a worse, not a better, place. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only reason why we had
tag because it was requirement for public method, and since it's no longer public I just prefer mark this methods as comments since
a) This is most common pattern in our codebase.
b) I think about this as more as internal message for developer: "Hey, don't delete this one, even if no one uses it, we actually call it via our DI system." Summary is more like a description of what is going on in this method, and this is more like a mark/internal comment.
In reply to: 245456969 [](ancestors = 245456969)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tags are a requirement for public methods. That ***does not make them a bad thing for private or internal methods***, especially if those comments explicitly reference other structures in the codebase, as this does.
Why would a developer prefer this comment, to the old comment? If you had instead had a <see
tag, as suggested above, that developer could even F12 on it to see what the heck we're talking about. Right?
In reply to: 245457620 [](ancestors = 245457620,245456969)
<see
tag, as suggested above, that developer could even F12 on it to see what the heck we're talking about. Right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably just be a simple search-replace, I think, to put it into proper form. At least, so I'd think.
In reply to: 245458497 [](ancestors = 245458497,245457620,245456969)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I can't win argument with you so I reverted my changes to summary comments. Although I'm still adding comments for other places since this is common pattern in our code.
I hope this is not a blocker, since main intent of this PR is to hide IDataTransforms methods.
In reply to: 245459044 [](ancestors = 245459044,245458497,245457620,245456969)
/// </summary> | ||
public static IDataTransform Create(IHostEnvironment env, MeanVarArguments args, IDataView input) | ||
// Factory method corresponding to SignatureDataTransform. | ||
internal static IDataTransform Create(IHostEnvironment env, MeanVarArguments args, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
If the comment is correct, then it should suffice for it to be private
, right? So is the comment incorrect? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everytime we have Entrypoint for transform I have to mark this method as internal.
As alternative I can refactor conversion of argument class to ColumnInfo array and use it here and in entrypoint, but oh, boy, it would be lot of changes across codebase.
In reply to: 245457063 [](ancestors = 245457063)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -15,7 +15,7 @@ public abstract class OneToOneTransformerBase : RowToRowTransformerBase | |||
{ | |||
protected readonly (string input, string output)[] ColumnPairs; | |||
|
|||
protected OneToOneTransformerBase(IHost host, (string input, string output)[] columns) : base(host) | |||
protected OneToOneTransformerBase(IHost host, params (string input, string output)[] columns) : base(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params [](start = 54, length = 6)
Is this related? It might be a good idea, just not sure? It doesn't at least at first glance seem to have anything to do with the issue. (Not suggesting we can't do it, just, not sure what the purpose of the change is.) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I replaced one of Create methods with constructor which accept only one input and output column, and I didn't like idea what I need to cast that pair into array of pairs.
This is why I changed it. Not sure that constructor is still in part of codereview, but I don't think it's in general bad thing to have.
In reply to: 245457186 [](ancestors = 245457186)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -100,32 +100,8 @@ private static VersionInfo GetVersionInfo() | |||
loaderAssemblyName: typeof(OnnxTransform).Assembly.FullName); | |||
} | |||
|
|||
public static IDataTransform Create(IHostEnvironment env, IDataView input, string modelFile, int? gpuDeviceId = null, bool fallbackToCpu = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static IDataTransform Create(IHostEnvironment env, IDataView input, string modelFile, int? gpuDeviceId = null, bool fallbackToCpu = false) [](start = 8, length = 145)
This is an "odd" method, so it seems good to remove, so are we instead working via the extension methods and MLContext
? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about extension methods, but in tests I switch to using OnnxTransform()
constructor with same parameters and then calling Transform
method on DataView.
I think it's better to get rid of this Create methods, and use either estimator, or since estimator is trivial, just call directly transform consturctor.
In reply to: 245457717 [](ancestors = 245457717)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="modelFile">Model file path.</param> | ||
/// <param name="inputColumn">The name of the input data column. Must match model input name.</param> | ||
/// <param name="outputColumn">The output columns to generate. Names must match model specifications. Data types are inferred from model.</param> | ||
public TensorFlowTransform(IHostEnvironment env, string modelFile, string inputColumn, string outputColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TensorFlowTransform [](start = 15, length = 19)
The naming I find incomprehensible, similar to the ONNX thing. It is called a Transform
, yet it appears to be a Transformer
going by the class relationship? Do we know what is going on here?
If it is in fact a transform and not a transformer, as the name suggests, then I'd prefer our public surface area be smaller, not larger? But maybe the name is just bad? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is very strange... it is called OnnxTransform, but, is it a Refers to: src/Microsoft.ML.OnnxTransform/OnnxTransform.cs:58 in 185099d. [](commit_id = 185099d, deletion_comment = False) |
It's a In reply to: 451613124 [](ancestors = 451613124) Refers to: src/Microsoft.ML.OnnxTransform/OnnxTransform.cs:58 in 185099d. [](commit_id = 185099d, deletion_comment = False) |
@@ -39,7 +39,7 @@ public sealed class ColumnCopyingEstimator : TrivialEstimator<ColumnCopyingTrans | |||
{ | |||
} | |||
|
|||
public ColumnCopyingEstimator(IHostEnvironment env, params (string source, string name)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnCopyingEstimator [](start = 15, length = 22)
I'll bug the next person about XML comments, or do it:) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you trust me if I say what I plan to do that in other PR?
This one is purely IDataTransform hiding.
In reply to: 246154879 [](ancestors = 246154879)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Factory method to create from arguments | ||
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input) | ||
// Factory method corresponding to SignatureDataTransform. | ||
internal static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
double-checking: can't be private? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's internal, it's mean we have entrypoint somewhere which invokes that create method.
I'm quite confident we have entrypoint for Column Copying.
In reply to: 246155060 [](ancestors = 246155060)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="env">The host environment.</param> | ||
/// <param name="input">Name of the input column.</param> | ||
/// <param name="output">Name of the column resulting from the transformation of <paramref name="input"/>.</param> | ||
/// <param name="imageWidth">Widht of resized image.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Widht [](start = 37, length = 5)
width.
int imageWidth, int imageHeight, ImageResizerTransform.ResizingKind resizing = ImageResizerTransform.ResizingKind.IsoCrop, ImageResizerTransform.Anchor cropAnchor = ImageResizerTransform.Anchor.Center) | ||
: this(env, new ImageResizerTransform(env, inputColumn, outputColumn, imageWidth, imageHeight, resizing, cropAnchor)) | ||
/// <summary> | ||
/// Estimator which resize images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resize [](start = 28, length = 6)
resizes
/// <param name="output">Name of the column resulting from the transformation of <paramref name="input"/>.</param> | ||
/// <param name="imageWidth">Widht of resized image.</param> | ||
/// <param name="imageHeight">Height of resized image.</param> | ||
/// <param name="resizing">What resize method to use.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resize method [](start = 40, length = 13)
@@ -115,7 +116,7 @@ public bool TryUnparse(StringBuilder sb) | |||
/// <param name="name">Name of the output column.</param> | |||
/// <param name="source">Name of the column to be transformed. If this is null '<paramref name="name"/>' will be used.</param> | |||
/// <param name="replaceWith">The replacement method to utilize.</param> | |||
public static IDataTransform Create(IHostEnvironment env, IDataView input, string name, string source = null, ReplacementKind replaceWith = ReplacementKind.DefaultValue) | |||
public static IDataView Create(IHostEnvironment env, IDataView input, string name, string source = null, ReplacementKind replaceWith = ReplacementKind.DefaultValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
Internal? Actually, does this need a constructor? it is not a transformer, and not on the catalogs, but it will still pop up if public.
@@ -279,8 +281,10 @@ public TransformApplierParams(TextFeaturizingEstimator parent) | |||
advancedSettings?.Invoke(AdvancedSettings); | |||
|
|||
_dictionary = null; | |||
_wordFeatureExtractor = new NgramExtractorTransform.NgramExtractorArguments(); | |||
_charFeatureExtractor = new NgramExtractorTransform.NgramExtractorArguments() { NgramLength = 3, AllLengths = false }; | |||
if (AdvancedSettings.UseWordExtractor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (AdvancedSettings.UseWordExtractor) [](start = 11, length = 39)
Pr title should have been: Hide the Create methods, change the name of the ImageTransformers to match the conventions used elsewhere, and fix every bug, whitespace and whatnot i see as i work :D #WontFix
Seed = args.Seed, | ||
Ordered = false, | ||
Column = hashColumns.ToArray(), | ||
InvertHash = args.InvertHash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args.InvertHash [](start = 33, length = 15)
are we losing this and the args.Seed? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
} | ||
}, data); | ||
data = new ColumnConcatenatingTransformer(Env, "Features", new[] { "Features1", "Features2" }).Transform(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = new ColumnConcatenatingTransformer(Env, "Features", new[] { "Features1", "Features2" }).Transform(data); [](start = 16, length = 111)
me looking at this and thinking that we should make the Transformer constructors internal.
Is there any benefit to leaving them public? (i know we've had the conversation for the Predictors/ModelParams and that made sense.)
If it is more than a yes/no answer, i can create an issue and we can discuss it there. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think idea was if it's a trivial estimator, it's ok to have public constructor for it's transformer.
Estimator is become pointless for them unless you try to build chain of estimators.
In reply to: 246171719 [](ancestors = 246171719)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -28,7 +28,7 @@ private class TestData | |||
[ConditionalFact(typeof(Environment), nameof(Environment.Is64BitProcess))] // TensorFlow is 64-bit only | |||
public void TensorFlowTransformMatrixMultiplicationTest() | |||
{ | |||
var model_location = "model_matmul/frozen_saved_model.pb"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_location [](start = 16, length = 14)
:) #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Ivanidzo4ka !
Step towards #1995.
Also add proper comments for Image transforms.
Remove Create methods in Onnx and Tensorflow (replace them with proper constructors).