-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Update Data Transforms #9371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Data Transforms #9371
Conversation
| @@ -1,136 +1,185 @@ | |||
| --- | |||
| title: Data transforms in ML.NET | |||
| title: Machine learning data transforms - ML.NET | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just looking to reduce the name 'transforms' since we're moving to the estimators API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But are transforms estimators? That seems confusing.
| | Transform | Definition | | ||
| | --- | --- | | ||
| | <xref:Microsoft.ML.Legacy.Transforms.CombinerByContiguousGroupId> | Groups values of a scalar column into a vector based on a contiguous group ID. | | ||
| | <xref:Microsoft.ML.Transforms.GroupTransform> | Groups values of a scalar column into a vector based on a contiguous group ID. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might recommend:
Short desc:
"Groups multiple contiguous rows of data in to a single row by forming a vector type of the original input column"
Long desc:
"The Group transform groups the consecutive rows that share the specified group key (or keys). Both group keys and the aggregated values can be of arbitrary non-vector types. The resulting data will have all the group key columns preserved, and the aggregated columns will become variable-length vectors of the original types."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should change the upsteam text where this file is produced from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should be filed as an issue on the dotnet/machinelearning repo as the API ref is auto-generated from the code XMLcomments.
| | <xref:Microsoft.ML.Runtime.ImageAnalytics.ImageResizerTransform> | Takes one or more ImageType columns and resizes them to the provided height and width.| | ||
| | <xref:Microsoft.ML.Transforms.Text.LatentDirichletAllocationTransformer> | Implements LightLDA, a state-of-the-art implementation of Latent Dirichlet Allocation.| | ||
| | <xref:Microsoft.ML.Transforms.LoadTransform> | Loads specific transforms from the specified model file. Allows for 'cherry picking' transforms from a serialized chain, or to apply a pre-trained transform to a different (but still compatible) data view. | | ||
| | <xref:Microsoft.ML.Transforms.Text.NgramExtractingTransformer> | Produces a bag of counts of ngrams (sequences of consecutive values of length 1-n) in a given vector of keys. It does so by building a dictionary of ngrams and using the id in the dictionary as the index in the bag. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in #Text processing and featurization
| @@ -1,136 +1,185 @@ | |||
| --- | |||
| title: Data transforms in ML.NET | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'd reorganize the doc into the name space..
Eg:
OneHotEncodingTransformer should go in the section for Microsoft.ML.Transforms.Categorical, "# Categorical".
Revised based on feedback. Co-Authored-By: JRAlexander <JRAlexander@users.noreply.github.com>
mairaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments.
| * [Schema](#schema) | ||
| * [Text processing and featurization](#text-processing-and-featurization) | ||
| * [Miscellaneous](#miscellaneous) | ||
| # Machine learning data transforms - ML.NET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit odd to have the - showing up in H1. I'd make the H1 different here and not use -
Maybe the original H1 was good and the non-branded terms are already added to the title metadata for SEO purposes.
| | <xref:Microsoft.ML.Legacy.Transforms.LabelColumnKeyBooleanConverter> | Transforms the label to either key or bool (if needed) to make it suitable for classification. | | ||
| | <xref:Microsoft.ML.Legacy.Transforms.LabelIndicator> | Label remapper used by OVA. | | ||
| | <xref:Microsoft.ML.Transforms.LabelConvertTransform> | Converts labels. | | ||
| | <xref:Microsoft.ML.Transforms.LabelIndicatorTransform> | Remaps multiclass labels to binary T,F labels, primarily for use with OVA.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so people know what T,F labels are?
|
Thanks, @mairaw! |
Update Data Transforms for latest release.
Internal Review URL
Fixes #9476, #9552