Skip to content

Conversation

@JRAlexander
Copy link
Contributor

@JRAlexander JRAlexander commented Dec 4, 2018

Update Data Transforms for latest release.

Internal Review URL

Fixes #9476, #9552

@JRAlexander JRAlexander mentioned this pull request Dec 7, 2018
@JRAlexander JRAlexander changed the title [WIP ]Update Data Transforms Update Data Transforms Dec 10, 2018
@@ -1,136 +1,185 @@
---
title: Data transforms in ML.NET
title: Machine learning data transforms - ML.NET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just looking to reduce the name 'transforms' since we're moving to the estimators API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are transforms estimators? That seems confusing.

| Transform | Definition |
| --- | --- |
| <xref:Microsoft.ML.Legacy.Transforms.CombinerByContiguousGroupId> | Groups values of a scalar column into a vector based on a contiguous group ID. |
| <xref:Microsoft.ML.Transforms.GroupTransform> | Groups values of a scalar column into a vector based on a contiguous group ID. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might recommend:
Short desc:
"Groups multiple contiguous rows of data in to a single row by forming a vector type of the original input column"

Long desc:
"The Group transform groups the consecutive rows that share the specified group key (or keys). Both group keys and the aggregated values can be of arbitrary non-vector types. The resulting data will have all the group key columns preserved, and the aggregated columns will become variable-length vectors of the original types."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should change the upsteam text where this file is produced from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be filed as an issue on the dotnet/machinelearning repo as the API ref is auto-generated from the code XMLcomments.

| <xref:Microsoft.ML.Runtime.ImageAnalytics.ImageResizerTransform> | Takes one or more ImageType columns and resizes them to the provided height and width.|
| <xref:Microsoft.ML.Transforms.Text.LatentDirichletAllocationTransformer> | Implements LightLDA, a state-of-the-art implementation of Latent Dirichlet Allocation.|
| <xref:Microsoft.ML.Transforms.LoadTransform> | Loads specific transforms from the specified model file. Allows for 'cherry picking' transforms from a serialized chain, or to apply a pre-trained transform to a different (but still compatible) data view. |
| <xref:Microsoft.ML.Transforms.Text.NgramExtractingTransformer> | Produces a bag of counts of ngrams (sequences of consecutive values of length 1-n) in a given vector of keys. It does so by building a dictionary of ngrams and using the id in the dictionary as the index in the bag. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in #Text processing and featurization

@@ -1,136 +1,185 @@
---
title: Data transforms in ML.NET
Copy link
Contributor

@justinormont justinormont Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd reorganize the doc into the name space..

Eg:
OneHotEncodingTransformer should go in the section for Microsoft.ML.Transforms.Categorical, "# Categorical".

Revised based on feedback.

Co-Authored-By: JRAlexander <JRAlexander@users.noreply.github.com>
Copy link
Contributor

@mairaw mairaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments.

* [Schema](#schema)
* [Text processing and featurization](#text-processing-and-featurization)
* [Miscellaneous](#miscellaneous)
# Machine learning data transforms - ML.NET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit odd to have the - showing up in H1. I'd make the H1 different here and not use -

Maybe the original H1 was good and the non-branded terms are already added to the title metadata for SEO purposes.

| <xref:Microsoft.ML.Legacy.Transforms.LabelColumnKeyBooleanConverter> | Transforms the label to either key or bool (if needed) to make it suitable for classification. |
| <xref:Microsoft.ML.Legacy.Transforms.LabelIndicator> | Label remapper used by OVA. |
| <xref:Microsoft.ML.Transforms.LabelConvertTransform> | Converts labels. |
| <xref:Microsoft.ML.Transforms.LabelIndicatorTransform> | Remaps multiclass labels to binary T,F labels, primarily for use with OVA.|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so people know what T,F labels are?

@JRAlexander
Copy link
Contributor Author

Thanks, @mairaw!

@JRAlexander JRAlexander merged commit 249480b into master Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Obsolete

5 participants