- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
Normalization API helpers #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| view = CompositeDataLoader.Create(env, loader, | ||
| new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>(null, component)); | ||
| } | ||
| // REVIEW: This verbose constructor should be replaced with zeahmed's enhancements once #405 is committed. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, can be made more brief with #405. #Closed
| view = CompositeDataLoader.Create(env, loader, | ||
| new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>(null, component)); | ||
| } | ||
| // REVIEW: This verbose constructor should be replaced with zeahmed's enhancements once #405 is committed. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zeahmed [](start = 76, length = 7)
I though we have tendency to remove aliases from codebase... #Resolved
| 
 do we also need this line? it just if you remove labelInfo, groupInfo looks weird if it remaind untouched. #Resolved Refers to: src/Microsoft.ML.FastTree/FastTree.cs:1361 in 08c9001. [](commit_id = 08c9001, deletion_comment = False) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

| 
 Yes, I should definitely get rid of it -- it's not actually used for anything. In reply to: 401122953 [](ancestors = 401122953) Refers to: src/Microsoft.ML.FastTree/FastTree.cs:1361 in 08c9001. [](commit_id = 08c9001, deletion_comment = False) | 
| view = CompositeDataLoader.Create(env, loader, | ||
| new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>(null, component)); | ||
| } | ||
| IDataView ApplyNormalizer(IHostEnvironment innerEnv, IDataView input) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my PR goes in first, you will be able to use convenience constructor here...:) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I even had a comment about that. :) But Ivan made me remove it. #Closed
| /// This contains information about a column in an <see cref="IDataView"/>. It is essentially a convenience cache | ||
| /// containing the name, column index, and column type for the column. The intended usage is that users of <see cref="RoleMappedSchema"/> | ||
| /// will have a convenient method of getting the index and type without having to separately query it through the <see cref="ISchema"/>, | ||
| /// since practically the first thing a consumer of a <see cref="RoleMappedSchema"/> will want to do once they get a mappping is | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel comment/summary is not complete here as sentence is ending in is? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. You go first then I'll adjust. This work in some ways will be helped by what you're doing as well, as you have noted. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// This method will not modify <paramref name="data"/> if the return from that is <c>null</c> or | ||
| /// <c>false</c>.</param> | ||
| /// <returns>True if the normalizer was applied and <paramref name="data"/> was modified</returns> | ||
| public static bool CreateIfNeeded(IHostEnvironment env, ref RoleMappedData data, ITrainer trainer) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I will be using this method in #424 just before calling train method, correct? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Also probably before the hypothetical cache-if-needed. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

| { | ||
| // REVIEW: The role mapped data has the ability to have multiple columns fill the role of features, which is | ||
| // useful in some trainers that are nonetheless parameteric and can therefore benefit from normalization. | ||
| var featInfo = schema.Feature; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) check schema for null? #Resolved
| } | ||
|  | ||
| /// <summary> | ||
| /// Returns whether a column has the <see cref="Kinds.IsNormalized"/> metadata set to true. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Do we instead want to document what IsNormalized means?  "Returns whether a column <insert what IsNormalized means>". #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add a bit more information. Though that information, I might prefer to make that part of the Kinds static class, since the documentation of those are the primary source of truth. This is just something we added as a convenience on top of that.
In reply to: 199596084 [](ancestors = 199596084)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just some minor questions.
d9e0b4d    to
    3f1e454      
    Compare
  
    * API conveniences for the Normalize transform
In which I introduce some helpers for normalization, and generally try to clean up the code-base. Addresses #433 . Hopefully will be used in #424, though I've changed the code here to use some of it where it made sense.
The
Microsoft.ML.Datatransform had a "hidden" dependency onMicrosoft.ML.Transformproject via dependency injection, for its existing "helper" for normalization (the console-app centric version). This has been resolved and replaced with direct instantiation. It required moving the normalizer files, however.Introduction of helpers on
NormalizeTransformfor API-centric operations. (Not necessarily useful directly for console-application/GUI usage.)Some documentation changes on
RoleMappedSchemaandRoleMappedData, though more non-cosmetic changes I'd expected would come with Direct API: RoleMappedSchema/Data Cleanup, Improvement #445 .