You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/machine-learning/tutorials/iris-clustering.md
+37-49Lines changed: 37 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Cluster iris flowers using a clustering learner - ML.NET
3
3
description: Learn how to use ML.NET in a clustering scenario
4
4
author: pkulikov
5
5
ms.author: johalex
6
-
ms.date: 07/02/2018
6
+
ms.date: 12/17/2018
7
7
ms.topic: tutorial
8
8
ms.custom: mvc, seodec18
9
9
#Customer intent: As a developer, I want to use ML.NET so that I can build a model to cluster iris flowers based on its parameters.
@@ -39,7 +39,7 @@ As you don't know to which group each flower belongs to, you choose the [unsuper
39
39
40
40
## Create a console application
41
41
42
-
1. Open Visual Studio 2017. Select **File** > **New** > **Project** from the menu bar. In the **New Project** dialog, select the **Visual C#** node followed by the **.NET Core** node. Then select the **Console App (.NET Core)** project template. In the **Name** text box, type "IrisClustering" and then select the **OK** button.
42
+
1. Open Visual Studio 2017. Select **File** > **New** > **Project** from the menu bar. In the **New Project** dialog, select the **Visual C#** node followed by the **.NET Core** node. Then select the **Console App (.NET Core)** project template. In the **Name** text box, type "IrisFlowerClustering" and then select the **OK** button.
43
43
44
44
1. Create a directory named *Data* in your project to store the data set and model files:
45
45
@@ -73,11 +73,11 @@ Create classes for the input data and the predictions:
73
73
1. In the **Add New Item** dialog box, select **Class** and change the **Name** field to *IrisData.cs*. Then, select the **Add** button.
74
74
1. Add the following `using` directive to the new file:
Remove the existing class definition and add the following code, which defines the classes `IrisData` and `ClusterPrediction`, to the *IrisData.cs* file:
79
79
80
-
[!code-csharp[Define data classes](../../../samples/machine-learning/tutorials/IrisClustering/IrisData.cs#2)]
80
+
[!code-csharp[Define data classes](~/samples/machine-learning/tutorials/IrisFlowerClustering/IrisData.cs#ClassDefinitions)]
81
81
82
82
`IrisData` is the input data class and has definitions for each feature from the data set. Use the [Column](xref:Microsoft.ML.Runtime.Api.ColumnAttribute) attribute to specify the indices of the source columns in the data set file.
83
83
@@ -98,103 +98,91 @@ Go back to the *Program.cs* file and add two fields to hold the paths to the dat
98
98
99
99
Add the following code right above the `Main` method to specify those paths:
The <xref:Microsoft.ML.MLContext?displayProperty=nameWithType> class represents the machine learning environment and provides mechanisms for logging and entry points for data loading, model training, prediction, and other tasks. This is comparable conceptually to using `DbContext` in Entity Framework.
122
118
123
-
}
124
-
```
119
+
## Setup data loading
125
120
126
-
The learning pipeline loads all of the data and algorithms necessary to train the model. Add the following code into the `Train` method:
121
+
Add the following code to the `Main` method to setup the way to load data:
[!code-csharp[Create text loader](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#SetupTextLoader)]
129
124
130
-
## Load and transform data
125
+
Note that the column names and indices match the schema defined by the `IrisData` class. The <xref:Microsoft.ML.Runtime.Data.DataKind.R4?displayProperty=nameWithType> value specifies the `float` type.
131
126
132
-
The first step to perform is to load the training data set. In our case, the training data set is stored in the text file with a path defined by the `_dataPath` field. Columns in the file are separated by the comma (","). Add the following code into the `Train` method:
127
+
Use instantiated <xref:Microsoft.ML.Runtime.Data.TextLoader> instance to create an <xref:Microsoft.ML.Runtime.Data.IDataView> instance, which represents the data source for the training data set:
133
128
134
-
[!code-csharp[Add step to load data](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#6)]
The next step is to combine all of the feature columns into the **Features** column using the <xref:Microsoft.ML.Legacy.Transforms.ColumnConcatenator> transformation class. By default, a learning algorithm processes only features from the **Features** column. Add the following code:
137
-
138
-
[!code-csharp[Add step to concatenate columns](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#7)]
131
+
## Create a learning pipeline
139
132
140
-
## Choose a learning algorithm
133
+
For this tutorial, the learning pipeline of the clustering task comprises two following steps:
141
134
142
-
After adding the data to the pipeline and transforming it into the correct input format, you select a learning algorithm (**learner**). The learner trains the model. ML.NET provides a <xref:Microsoft.ML.Legacy.Trainers.KMeansPlusPlusClusterer> learner that implements [k-means algorithm](https://en.wikipedia.org/wiki/K-means_clustering) with an improved method for choosing the initial cluster centroids.
135
+
- concatenate loaded columns into one **Features** column, which is used by a clustering trainer;
136
+
- use a <xref:Microsoft.ML.Trainers.KMeans.KMeansPlusPlusTrainer> trainer to train the model using the k-means++ clustering algorithm.
143
137
144
-
Add the following code into the `Train` method following the data processing code added in the previous step:
138
+
Add the following code to the `Main` method:
145
139
146
-
[!code-csharp[Add a learner step](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#8)]
Use the <xref:Microsoft.ML.Legacy.Trainers.KMeansPlusPlusClusterer.K?displayProperty=nameWithType> property to specify number of clusters. The code above specifies that the data set should be split in three clusters.
142
+
The code specifies that the data set should be split in three clusters.
149
143
150
144
## Train the model
151
145
152
-
The steps added in the preceding sections prepared the pipeline for training, however, none have been executed. The `pipeline.Train<TInput, TOutput>` method produces the model that takes in an instance of the `TInput` type and outputs an instance of the `TOutput` type. Add the following code into the `Train` method:
146
+
The steps added in the preceding sections prepared the pipeline for training, however, none have been executed. Add the following line to the `Main` method to perform data loading and model training:
153
147
154
-
[!code-csharp[Train the model and return](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#9)]
148
+
[!code-csharp[Train the model](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#TrainModel)]
155
149
156
150
### Save the model
157
151
158
-
At this point, you have a model that can be integrated into any of your existing or new .NET applications. To save your model to a .zip file, add the following code to the `Main` method below the call to the `Train` method:
152
+
At this point, you have a model that can be integrated into any of your existing or new .NET applications. To save your model to a .zip file, add the following code to the `Main` method:
159
153
160
-
[!code-csharp[Save the model](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#10)]
154
+
[!code-csharp[Save the model](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#SaveModel)]
161
155
162
-
Using `await` in the `Main` method means the `Main` method must have the `async` modifier and return a `Task`:
163
-
164
-
[!code-csharp[Make the Main method async](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#11)]
165
-
166
-
You also need to add the following `using` directive at the top of the *Program.cs* file:
Because the `async Main` method is the feature added in C# 7.1 and the default language version of the project is C# 7.0, you need to change the language version to C# 7.1 or higher. To do that, right-click the project node in **Solution Explorer** and select **Properties**. Select the **Build** tab and select the **Advanced** button. In the dropdown, select **C# 7.1** (or a higher version). Select the **OK** button.
158
+
To make predictions, use the <xref:Microsoft.ML.Runtime.Data.PredictionFunction%602> class that takes instances of the input type through the transformer pipeline and produces instances of the output type. Add the following line to the `Main` method to create an instance of that class:
Create the `TestIrisData` class to house test data instances:
175
163
176
164
1. In **Solution Explorer**, right-click the project, and then select **Add** > **New Item**.
177
165
1. In the **Add New Item** dialog box, select **Class** and change the **Name** field to *TestIrisData.cs*. Then, select the **Add** button.
178
166
1. Modify the class to be static like in the following example:
179
167
180
-
[!code-csharp[Make class static](../../../samples/machine-learning/tutorials/IrisClustering/TestIrisData.cs#1)]
168
+
[!code-csharp[Make class static](~/samples/machine-learning/tutorials/IrisFlowerClustering/TestIrisData.cs#Static)]
181
169
182
170
This tutorial introduces one iris data instance within this class. You can add other scenarios to experiment with the model. Add the following code into the `TestIrisData` class:
To find out the cluster to which the specified item belongs to, go back to the *Program.cs* file and add the following code into the `Main` method:
187
175
188
-
[!code-csharp[Predict and output results](../../../samples/machine-learning/tutorials/IrisClustering/Program.cs#13)]
176
+
[!code-csharp[Predict and output results](~/samples/machine-learning/tutorials/IrisFlowerClustering/Program.cs#PredictionExample)]
189
177
190
-
Run the program to see which cluster contains the specified data instance and squared distances from that instance to the cluster centroids. Your results should be similar to the following. As the pipeline processes, it might display warnings or processing messages. These have been removed from the following output for clarity.
178
+
Run the program to see which cluster contains the specified data instance and squared distances from that instance to the cluster centroids. Your results should be similar to the following:
191
179
192
180
```text
193
181
Cluster: 2
194
-
Distances: 0.4192338 0.0008847713 0.9660053
182
+
Distances: 11.69127 0.02159119 25.59896
195
183
```
196
184
197
-
Congratulations! You've now successfully built a machine learning model for iris clustering and used it to make predictions. You can find the source code for this tutorial at the [dotnet/samples](https://github.com/dotnet/samples/tree/master/machine-learning/tutorials/IrisClustering) GitHub repository.
185
+
Congratulations! You've now successfully built a machine learning model for iris clustering and used it to make predictions. You can find the source code for this tutorial at the [dotnet/samples](https://github.com/dotnet/samples/tree/master/machine-learning/tutorials/IrisFlowerClustering) GitHub repository.
0 commit comments