Skip to content

Commit 3f5e4d0

Browse files
pkulikovJRAlexander
authored andcommitted
ML.NET: update the clustering tutorial (#9620)
1 parent d5b9302 commit 3f5e4d0

File tree

2 files changed

+38
-50
lines changed

2 files changed

+38
-50
lines changed

docs/machine-learning/tutorials/iris-clustering.md

Lines changed: 37 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Cluster iris flowers using a clustering learner - ML.NET
33
description: Learn how to use ML.NET in a clustering scenario
44
author: pkulikov
55
ms.author: johalex
6-
ms.date: 07/02/2018
6+
ms.date: 12/17/2018
77
ms.topic: tutorial
88
ms.custom: mvc, seodec18
99
#Customer intent: As a developer, I want to use ML.NET so that I can build a model to cluster iris flowers based on its parameters.
@@ -39,7 +39,7 @@ As you don't know to which group each flower belongs to, you choose the [unsuper
3939

4040
## Create a console application
4141

42-
1. Open Visual Studio 2017. Select **File** > **New** > **Project** from the menu bar. In the **New Project** dialog, select the **Visual C#** node followed by the **.NET Core** node. Then select the **Console App (.NET Core)** project template. In the **Name** text box, type "IrisClustering" and then select the **OK** button.
42+
1. Open Visual Studio 2017. Select **File** > **New** > **Project** from the menu bar. In the **New Project** dialog, select the **Visual C#** node followed by the **.NET Core** node. Then select the **Console App (.NET Core)** project template. In the **Name** text box, type "IrisFlowerClustering" and then select the **OK** button.
4343

4444
1. Create a directory named *Data* in your project to store the data set and model files:
4545

@@ -73,11 +73,11 @@ Create classes for the input data and the predictions:
7373
1. In the **Add New Item** dialog box, select **Class** and change the **Name** field to *IrisData.cs*. Then, select the **Add** button.
7474
1. Add the following `using` directive to the new file:
7575

76-
[!code-csharp[Add necessary usings](../../../samples/machine-learning/tutorials/IrisClustering/IrisData.cs#1)]
76+
[!code-csharp[Add necessary usings](~/samples/machine-learning/tutorials/IrisFlowerClustering/IrisData.cs#Usings)]
7777

7878
Remove the existing class definition and add the following code, which defines the classes `IrisData` and `ClusterPrediction`, to the *IrisData.cs* file:
7979

80-
[!code-csharp[Define data classes](../../../samples/machine-learning/tutorials/IrisClustering/IrisData.cs#2)]
80+
[!code-csharp[Define data classes](~/samples/machine-learning/tutorials/IrisFlowerClustering/IrisData.cs#ClassDefinitions)]
8181

8282
`IrisData` is the input data class and has definitions for each feature from the data set. Use the [Column](xref:Microsoft.ML.Runtime.Api.ColumnAttribute) attribute to specify the indices of the source columns in the data set file.
8383

@@ -98,103 +98,91 @@ Go back to the *Program.cs* file and add two fields to hold the paths to the dat
9898

9999
Add the following code right above the `Main` method to specify those paths:
100100

101-
[!code-csharp[Initialize paths](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#1)]
101+
[!code-csharp[Initialize paths](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#Paths)]
102102

103103
To make the preceding code compile, add the following `using` directives at the top of the *Program.cs* file:
104104

105-
[!code-csharp[Add usings for paths](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#2)]
105+
[!code-csharp[Add usings for paths](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#UsingsForPaths)]
106106

107-
## Create a learning pipeline
107+
## Create ML context
108108

109109
Add the following additional `using` directives to the top of the *Program.cs* file:
110110

111-
[!code-csharp[Add Microsoft.ML usings](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#3)]
112-
113-
In the `Main` method, replace the `Console.WriteLine("Hello World!")` with the following code:
111+
[!code-csharp[Add Microsoft.ML usings](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#MLUsings)]
114112

115-
[!code-csharp[Call the Train method](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#4)]
113+
In the `Main` method, replace the `Console.WriteLine("Hello World!");` line with the following code:
116114

117-
The `Train` method trains the model. Create that method just below the `Main` method, using the following code:
115+
[!code-csharp[Create ML context](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#CreateContext)]
118116

119-
```csharp
120-
private static PredictionModel<IrisData, ClusterPrediction> Train()
121-
{
117+
The <xref:Microsoft.ML.MLContext?displayProperty=nameWithType> class represents the machine learning environment and provides mechanisms for logging and entry points for data loading, model training, prediction, and other tasks. This is comparable conceptually to using `DbContext` in Entity Framework.
122118

123-
}
124-
```
119+
## Setup data loading
125120

126-
The learning pipeline loads all of the data and algorithms necessary to train the model. Add the following code into the `Train` method:
121+
Add the following code to the `Main` method to setup the way to load data:
127122

128-
[!code-csharp[Initialize pipeline](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#5)]
123+
[!code-csharp[Create text loader](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#SetupTextLoader)]
129124

130-
## Load and transform data
125+
Note that the column names and indices match the schema defined by the `IrisData` class. The <xref:Microsoft.ML.Runtime.Data.DataKind.R4?displayProperty=nameWithType> value specifies the `float` type.
131126

132-
The first step to perform is to load the training data set. In our case, the training data set is stored in the text file with a path defined by the `_dataPath` field. Columns in the file are separated by the comma (","). Add the following code into the `Train` method:
127+
Use instantiated <xref:Microsoft.ML.Runtime.Data.TextLoader> instance to create an <xref:Microsoft.ML.Runtime.Data.IDataView> instance, which represents the data source for the training data set:
133128

134-
[!code-csharp[Add step to load data](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#6)]
129+
[!code-csharp[Create IDataView](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#CreateDataView)]
135130

136-
The next step is to combine all of the feature columns into the **Features** column using the <xref:Microsoft.ML.Legacy.Transforms.ColumnConcatenator> transformation class. By default, a learning algorithm processes only features from the **Features** column. Add the following code:
137-
138-
[!code-csharp[Add step to concatenate columns](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#7)]
131+
## Create a learning pipeline
139132

140-
## Choose a learning algorithm
133+
For this tutorial, the learning pipeline of the clustering task comprises two following steps:
141134

142-
After adding the data to the pipeline and transforming it into the correct input format, you select a learning algorithm (**learner**). The learner trains the model. ML.NET provides a <xref:Microsoft.ML.Legacy.Trainers.KMeansPlusPlusClusterer> learner that implements [k-means algorithm](https://en.wikipedia.org/wiki/K-means_clustering) with an improved method for choosing the initial cluster centroids.
135+
- concatenate loaded columns into one **Features** column, which is used by a clustering trainer;
136+
- use a <xref:Microsoft.ML.Trainers.KMeans.KMeansPlusPlusTrainer> trainer to train the model using the k-means++ clustering algorithm.
143137

144-
Add the following code into the `Train` method following the data processing code added in the previous step:
138+
Add the following code to the `Main` method:
145139

146-
[!code-csharp[Add a learner step](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#8)]
140+
[!code-csharp[Create pipeline](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#CreatePipeline)]
147141

148-
Use the <xref:Microsoft.ML.Legacy.Trainers.KMeansPlusPlusClusterer.K?displayProperty=nameWithType> property to specify number of clusters. The code above specifies that the data set should be split in three clusters.
142+
The code specifies that the data set should be split in three clusters.
149143

150144
## Train the model
151145

152-
The steps added in the preceding sections prepared the pipeline for training, however, none have been executed. The `pipeline.Train<TInput, TOutput>` method produces the model that takes in an instance of the `TInput` type and outputs an instance of the `TOutput` type. Add the following code into the `Train` method:
146+
The steps added in the preceding sections prepared the pipeline for training, however, none have been executed. Add the following line to the `Main` method to perform data loading and model training:
153147

154-
[!code-csharp[Train the model and return](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#9)]
148+
[!code-csharp[Train the model](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#TrainModel)]
155149

156150
### Save the model
157151

158-
At this point, you have a model that can be integrated into any of your existing or new .NET applications. To save your model to a .zip file, add the following code to the `Main` method below the call to the `Train` method:
152+
At this point, you have a model that can be integrated into any of your existing or new .NET applications. To save your model to a .zip file, add the following code to the `Main` method:
159153

160-
[!code-csharp[Save the model](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#10)]
154+
[!code-csharp[Save the model](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#SaveModel)]
161155

162-
Using `await` in the `Main` method means the `Main` method must have the `async` modifier and return a `Task`:
163-
164-
[!code-csharp[Make the Main method async](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#11)]
165-
166-
You also need to add the following `using` directive at the top of the *Program.cs* file:
167-
168-
[!code-csharp[Add System.Threading.Tasks using](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#12)]
156+
## Use the model for predictions
169157

170-
Because the `async Main` method is the feature added in C# 7.1 and the default language version of the project is C# 7.0, you need to change the language version to C# 7.1 or higher. To do that, right-click the project node in **Solution Explorer** and select **Properties**. Select the **Build** tab and select the **Advanced** button. In the dropdown, select **C# 7.1** (or a higher version). Select the **OK** button.
158+
To make predictions, use the <xref:Microsoft.ML.Runtime.Data.PredictionFunction%602> class that takes instances of the input type through the transformer pipeline and produces instances of the output type. Add the following line to the `Main` method to create an instance of that class:
171159

172-
## Use the model for predictions
160+
[!code-csharp[Create predictor](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#Predictor)]
173161

174162
Create the `TestIrisData` class to house test data instances:
175163

176164
1. In **Solution Explorer**, right-click the project, and then select **Add** > **New Item**.
177165
1. In the **Add New Item** dialog box, select **Class** and change the **Name** field to *TestIrisData.cs*. Then, select the **Add** button.
178166
1. Modify the class to be static like in the following example:
179167

180-
[!code-csharp[Make class static](../../../samples/machine-learning/tutorials/IrisClustering/TestIrisData.cs#1)]
168+
[!code-csharp[Make class static](~/samples/machine-learning/tutorials/IrisFlowerClustering/TestIrisData.cs#Static)]
181169

182170
This tutorial introduces one iris data instance within this class. You can add other scenarios to experiment with the model. Add the following code into the `TestIrisData` class:
183171

184-
[!code-csharp[Test data](../../../samples/machine-learning/tutorials/IrisClustering/TestIrisData.cs#2)]
172+
[!code-csharp[Test data](~/samples/machine-learning/tutorials/IrisFlowerClustering/TestIrisData.cs#TestData)]
185173

186174
To find out the cluster to which the specified item belongs to, go back to the *Program.cs* file and add the following code into the `Main` method:
187175

188-
[!code-csharp[Predict and output results](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#13)]
176+
[!code-csharp[Predict and output results](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#PredictionExample)]
189177

190-
Run the program to see which cluster contains the specified data instance and squared distances from that instance to the cluster centroids. Your results should be similar to the following. As the pipeline processes, it might display warnings or processing messages. These have been removed from the following output for clarity.
178+
Run the program to see which cluster contains the specified data instance and squared distances from that instance to the cluster centroids. Your results should be similar to the following:
191179

192180
```text
193181
Cluster: 2
194-
Distances: 0.4192338 0.0008847713 0.9660053
182+
Distances: 11.69127 0.02159119 25.59896
195183
```
196184

197-
Congratulations! You've now successfully built a machine learning model for iris clustering and used it to make predictions. You can find the source code for this tutorial at the [dotnet/samples](https://github.com/dotnet/samples/tree/master/machine-learning/tutorials/IrisClustering) GitHub repository.
185+
Congratulations! You've now successfully built a machine learning model for iris clustering and used it to make predictions. You can find the source code for this tutorial at the [dotnet/samples](https://github.com/dotnet/samples/tree/master/machine-learning/tutorials/IrisFlowerClustering) GitHub repository.
198186

199187
## Next steps
200188

docs/toc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1177,7 +1177,7 @@
11771177
## [Tutorials](machine-learning/tutorials/index.md)
11781178
### [Sentiment analysis (binary classification)](machine-learning/tutorials/sentiment-analysis.md)
11791179
### [Taxi fare predictor (regression)](machine-learning/tutorials/taxi-fare.md)
1180-
### [Iris petals (clustering)](machine-learning/tutorials/iris-clustering.md)
1180+
### [Iris flowers (clustering)](machine-learning/tutorials/iris-clustering.md)
11811181
## [How-to guides](machine-learning/how-to-guides/index.md)
11821182
### [Apply categorical feature engineering ](machine-learning/how-to-guides/train-model-categorical-ml-net.md)
11831183
### [Apply textual feature engineering ](machine-learning/how-to-guides/train-model-textual-ml-net.md)

0 commit comments

Comments
 (0)