Skip to content

Commit

Permalink
Add release notes for ML.NET 0.2 (#301)
Browse files Browse the repository at this point in the history
* Add release notes for ML.NET 0.2

* Adding release note about TextLoader changes and additional issue/PR references

* Addressing comments: fixing typos, changing formatting, and adding references
  • Loading branch information
GalOshri authored and Shauheen committed Jun 5, 2018
1 parent 62da34e commit edd528a
Showing 1 changed file with 95 additions and 0 deletions.
95 changes: 95 additions & 0 deletions docs/release-notes/0.2/release-0.2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# ML.NET 0.2 Release Notes

We would like to thank the community for the engagement so far and helping us
shape ML.NET.

Today we are releasing ML.NET 0.2. This release focuses on addressing
questions/issues, adding clustering to the list of supported machine learning
tasks, enabling using data from memory to train models, easier model
validation, and more.

### Installation

ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
Core
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
for more details.

You can install ML.NET NuGet from the CLI using:
```
dotnet add package Microsoft.ML
```

From package manager:
```
Install-Package Microsoft.ML
```

### Release Notes

Below are some of the highlights from this release.

* Added clustering to the list of supported machine learning tasks

* Clustering is an unsupervised learning task that groups sets of items
based on their features. It identifies which items are more similar to
each other than other items. This might be useful in scenarios such as
organizing news articles into groups based on their topics, segmenting
users based on their shopping habits, and grouping viewers based on
their taste in movies.

* ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implements [K-Means++
clustering](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf)
with [Yinyang K-means
acceleration](https://www.microsoft.com/en-us/research/publication/yinyang-k-means-a-drop-in-replacement-of-the-classic-k-means-with-consistent-speedup/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D252149).
[This
test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs)
shows how to use it (from
[#222](https://github.com/dotnet/machinelearning/pull/222)).

* Train using data objects in addition to loading data from a file using
`CollectionDataSource`. ML.NET 0.1 enabled loading data from a delimited
text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a
collection of objects as the input to a `LearningPipeline`. See sample usage
[here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133)
(from [#106](https://github.com/dotnet/machinelearning/pull/106)).

* Easier model validation with cross-validation and train-test

* [Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics))
is an approach to validating how well your model statistically performs.
It does not require a separate test dataset, but rather uses your
training data to test your model (it partitions the data so different
data is used for training and testing, and it does this multiple times).
[Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51)
is an example for doing cross-validation (from
[#212](https://github.com/dotnet/machinelearning/pull/212)).

* Train-test is a shortcut to testing your model on a separate dataset.
See example usage
[here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L36).

* Note that the `LearningPipeline` is prepared the same way in both cases.

* Speed improvement for predictions: by not creating a parallel cursor for
dataviews that only have one element, we get a significant speed-up for
predictions (see
[#179](https://github.com/dotnet/machinelearning/issues/179) for a few
measurements).

* Updated `TextLoader` API: the `TextLoader` API is now code generated and was
updated to take explicit declarations for the columns in the data, which is
required in some scenarios. See
[#142](https://github.com/dotnet/machinelearning/pull/142).

* Added daily NuGet builds of the project: daily NuGet builds of ML.NET are
now available
[here](https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML).

Additional issues closed in this milestone can be found [here](https://github.com/dotnet/machinelearning/milestone/1?closed=1).

### Acknowledgements

Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m,
forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions
as part of this release!

0 comments on commit edd528a

Please sign in to comment.