Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions Directory.Build.targets
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,33 @@
Text="The tools directory [$(ToolsDir)] does not exist. Please run build in the root of the repo to ensure the tools are installed before attempting to build an individual project." />
</Target>

<Target Name="CopyNativeAssemblies"
BeforeTargets="PrepareForRun">

<PropertyGroup>
<LibPrefix Condition="'$(OS)' != 'Windows_NT'">lib</LibPrefix>
<LibExtension Condition="'$(OS)' == 'Windows_NT'">.dll</LibExtension>
<LibExtension Condition="'$(OS)' != 'Windows_NT'">.so</LibExtension>
<LibExtension Condition="$([MSBuild]::IsOSPlatform('osx'))">.dylib</LibExtension>
</PropertyGroup>

<ItemGroup>
<NativeAssemblyReference>
<FullAssemblyPath>$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReference.Identity)$(LibExtension)</FullAssemblyPath>
</NativeAssemblyReference>
</ItemGroup>

<Copy SourceFiles = "@(NativeAssemblyReference->'%(FullAssemblyPath)')"
DestinationFolder="$(OutputPath)"
OverwriteReadOnlyFiles="$(OverwriteReadOnlyFiles)"
Retries="$(CopyRetryCount)"
RetryDelayMilliseconds="$(CopyRetryDelayMilliseconds)"
UseHardlinksIfPossible="$(CreateHardLinksForPublishFilesIfPossible)"
UseSymboliclinksIfPossible="$(CreateSymbolicLinksForPublishFilesIfPossible)">
<Output TaskParameter="DestinationFiles" ItemName="FileWrites"/>
</Copy>

</Target>

<Import Project="$(ToolsDir)/versioning.targets" Condition="Exists('$(ToolsDir)/versioning.targets')" />
</Project>
114 changes: 114 additions & 0 deletions docs/release-notes/0.3/release-0.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# ML.NET 0.3 Release Notes

Today we are releasing ML.NET 0.3. This release focuses on adding components
to ML.NET from the internal codebase (such as Factorization Machines,
LightGBM, Ensembles, and LightLDA), enabling export to the ONNX model format,
and bug fixes.

### Installation

ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
Core
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
for more details.

You can install ML.NET NuGet from the CLI using:
```
dotnet add package Microsoft.ML
```

From package manager:
```
Install-Package Microsoft.ML
```

### Release Notes

Below are some of the highlights from this release.

* Added Field-Aware Factorization Machines (FFM) as a learner for binary
classification (#383)

* FFM is useful for various large sparse datasets, especially in areas
such as recommendations and click prediction. It has been used to win
various click prediction competitions such as the [Criteo Display
Advertising Challenge on
Kaggle](https://www.kaggle.com/c/criteo-display-ad-challenge). You can
learn more about the winning solution
[here](https://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf).
* FFM is a streaming learner so it does not require the entire dataset to
fit in memory.
* You can learn more about FFM
[here](http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf) and some of the
speedup approaches that are used in ML.NET
[here](https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf).

* Added [LightGBM](https://github.com/Microsoft/LightGBM) as a learner for
binary classification, multiclass classification, and regression (#392)

* LightGBM is a tree based gradient boosting machine. It is under the
umbrella of the [DMTK](http://github.com/microsoft/dmtk) project at
Microsoft.
* The LightGBM repository shows various [comparison
experiments](https://github.com/Microsoft/LightGBM/blob/6488f319f243f7ff679a8e388a33e758c5802303/docs/Experiments.rst#comparison-experiment)
that show good accuracy and speed, so it is a great learner to try out.
It has also been used in winning solutions in various [ML
challenges](https://github.com/Microsoft/LightGBM/blob/a6e878e2fc6e7f545921cbe337cc511fbd1f500d/examples/README.md).
* This addition wraps LightGBM and exposes it in ML.NET.
* Note that LightGBM can also be used for ranking, but the ranking
evaluator is not yet exposed in ML.NET.

* Added Ensemble learners for binary classification, multiclass
classification, and regression (#379)

* [Ensemble learners](https://en.wikipedia.org/wiki/Ensemble_learning)
enable using multiple learners in one model. As an example, the Ensemble
learner could train both `FastTree` and `AveragedPerceptron` and average
their predictions to get the final prediction.
* Combining multiple models of similar statistical performance may lead to
better performance than each model separately.

* Added LightLDA transform for topic modeling (#377)

* LightLDA is an implementation of [Latent Dirichlet
Allocation](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)
which infers topical structure from text data.
* The implementation of LightLDA in ML.NET is based on [this
paper](https://arxiv.org/abs/1412.1576). There is a distributed
implementation of LightLDA
[here](https://github.com/Microsoft/lightlda).

* Added One-Versus-All (OVA) learner for multiclass classification (#363)

* [OVA](https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest)
(sometimes known as One-Versus-Rest) is an approach to using binary
classifiers in multiclass classification problems.
* While some binary classification learners in ML.NET natively support
multiclass classification (e.g. Logistic Regression), there are others
that do not (e.g. Averaged Perceptron). OVA enables using the latter
group for multiclass classification as well.

* Enabled export of ML.NET models to the [ONNX](https://onnx.ai/) format
(#248)

* ONNX is a common format for representing deep learning models (also
supporting certain other types of models) which enables developers to
move models between different ML toolkits.
* ONNX models can be used in [Windows
ML](https://docs.microsoft.com/en-us/windows/uwp/machine-learning/overview)
which enables evaluating models on Windows 10 devices and taking
advantage of capabilities like hardware acceleration.
* Currently, only a subset of ML.NET components can be used in a model
that is converted to ONNX.

Additional issues closed in this milestone can be found
[here](https://github.com/dotnet/machinelearning/milestone/2?closed=1).

### Acknowledgements

Shoutout to [pkulikov](https://github.com/pkulikov),
[veikkoeeva](https://github.com/veikkoeeva),
[ross-p-smith](https://github.com/ross-p-smith),
[jwood803](https://github.com/jwood803),
[Nepomuceno](https://github.com/Nepomuceno), and the ML.NET team for their
contributions as part of this release!
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Console/Console.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ public static class Console
{
public static int Main(string[] args) => Maml.Main(args);
}
}
}
27 changes: 22 additions & 5 deletions src/Microsoft.ML.Console/Microsoft.ML.Console.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,34 @@
<PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<DefineConstants>CORECLR</DefineConstants>
<TargetFramework>netcoreapp2.0</TargetFramework>
<OutputType>Exe</OutputType>
<AssemblyName>MML</AssemblyName>
<StartupObject>Microsoft.ML.Runtime.Tools.Console.Console</StartupObject>
<TargetFramework>netcoreapp2.0</TargetFramework>
<OutputType>Exe</OutputType>
<AssemblyName>MML</AssemblyName>
<StartupObject>Microsoft.ML.Runtime.Tools.Console.Console</StartupObject>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\Microsoft.ML.Core\Microsoft.ML.Core.csproj" />
<ProjectReference Include="..\Microsoft.ML.CpuMath\Microsoft.ML.CpuMath.csproj" />
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" />
<ProjectReference Include="..\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.csproj" />
<ProjectReference Include="..\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" />
<ProjectReference Include="..\Microsoft.ML.InternalStreams\Microsoft.ML.InternalStreams.csproj" />
<ProjectReference Include="..\Microsoft.ML.KMeansClustering\Microsoft.ML.KMeansClustering.csproj" />
<ProjectReference Include="..\Microsoft.ML.LightGBM\Microsoft.ML.LightGBM.csproj" />
<ProjectReference Include="..\Microsoft.ML.Maml\Microsoft.ML.Maml.csproj" />
<ProjectReference Include="..\Microsoft.ML.PCA\Microsoft.ML.PCA.csproj" />
<ProjectReference Include="..\Microsoft.ML.PipelineInference\Microsoft.ML.PipelineInference.csproj" />
</ItemGroup>
<ProjectReference Include="..\Microsoft.ML.ResultProcessor\Microsoft.ML.ResultProcessor.csproj" />
<ProjectReference Include="..\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
<ProjectReference Include="..\Microsoft.ML.Sweeper\Microsoft.ML.Sweeper.csproj" />
<ProjectReference Include="..\Microsoft.ML.Transforms\Microsoft.ML.Transforms.csproj" />
<ProjectReference Include="..\Microsoft.ML.UniversalModelFormat\Microsoft.ML.UniversalModelFormat.csproj" />

<NativeAssemblyReference Include="FastTreeNative" />
<NativeAssemblyReference Include="CpuMathNative" />
<NativeAssemblyReference Include="FactorizationMachineNative" />
<NativeAssemblyReference Include="LdaNative" />
</ItemGroup>

</Project>
5 changes: 5 additions & 0 deletions src/Microsoft.ML.Core/EntryPoints/ModuleArgs.cs
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,11 @@ public sealed class EntryPointAttribute : Attribute
/// Short name of the Entry Point
/// </summary>
public string ShortName { get; set; }

/// <summary>
/// Remarks on the Entry Point, for more extensive XML documentation on the C#API
/// </summary>
public string Remarks { get; set; }
}

/// <summary>
Expand Down
2 changes: 2 additions & 0 deletions src/Microsoft.ML.Core/EntryPoints/ModuleCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ public sealed class EntryPointInfo
public readonly string Description;
public readonly string ShortName;
public readonly string FriendlyName;
public readonly string Remarks;
public readonly MethodInfo Method;
public readonly Type InputType;
public readonly Type OutputType;
Expand All @@ -63,6 +64,7 @@ internal EntryPointInfo(IExceptionContext ectx, MethodInfo method,
Method = method;
ShortName = attribute.ShortName;
FriendlyName = attribute.UserName;
Remarks = attribute.Remarks;
ObsoleteAttribute = obsoleteAttribute;

// There are supposed to be 2 parameters, env and input for non-macro nodes.
Expand Down
8 changes: 4 additions & 4 deletions src/Microsoft.ML.Data/Evaluators/EvaluatorUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -653,10 +653,10 @@ public static void ReconcileKeyValuesWithNoNames(IHostEnvironment env, IDataView
ValueMapper<uint, uint> mapper =
(ref uint src, ref uint dst) =>
{
if (src == 0 || src > keyCount)
if (src > keyCount)
dst = 0;
else
dst = src + 1;
dst = src;
};
views[i] = LambdaColumnMapper.Create(env, "ReconcileKeyValues", views[i], columnName, columnName,
views[i].Schema.GetColumnType(index), keyType, mapper);
Expand Down Expand Up @@ -866,7 +866,7 @@ private static IDataView AppendPerInstanceDataViews(IHostEnvironment env, string
}
else if (dvNumber == 0 && dv.Schema.HasKeyNames(i, type.KeyCount))
firstDvKeyWithNamesColumns.Add(name);
else if (type.KeyCount > 0 && name != labelColName)
else if (type.KeyCount > 0 && name != labelColName && !dv.Schema.HasKeyNames(i, type.KeyCount))
{
// For any other key column (such as GroupId) we do not reconcile the key values, we only convert to U4.
if (!firstDvKeyNoNamesColumns.ContainsKey(name))
Expand Down Expand Up @@ -901,7 +901,7 @@ private static IDataView AppendPerInstanceDataViews(IHostEnvironment env, string
Func<IDataView, int, IDataView> keyToValue =
(idv, i) =>
{
foreach (var keyCol in firstDvVectorKeyColumns.Prepend(labelColName))
foreach (var keyCol in firstDvVectorKeyColumns.Concat(firstDvKeyWithNamesColumns).Prepend(labelColName))
{
if (keyCol == labelColName && labelColKeyValuesType == null)
continue;
Expand Down
25 changes: 25 additions & 0 deletions src/Microsoft.ML.FastTree/FastTree.cs
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,31 @@ public abstract class FastTreeTrainerBase<TArgs, TPredictor> :

protected string InnerArgs => CmdParser.GetSettings(Host, Args, new TArgs());

internal const string Remarks = @"<remarks>
<para>FastTrees is an efficient implementation of the <a href='https://arxiv.org/abs/1505.01866'>MART</a> gradient boosting algorithm.
Gradient boosting is a machine learning technique for regression problems.
It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next.
So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.
</para>
<para>
MART learns an ensemble of regression trees, which is a decision tree with scalar values in its leaves.
A decision (or regression) tree is a binary tree-like flow chart, where at each interior node one decides which of the two child nodes to continue to based on one of the feature values from the input.
At each leaf node, a value is returned. In the interior nodes, the decision is based on the test 'x <= v' where x is the value of the feature in the input sample and v is one of the possible values of this feature.
The functions that can be produced by a regression tree are all the piece-wise constant functions.
</para>
<para>
The ensemble of trees is produced by computing, in each step, a regression tree that approximates the gradient of the loss function, and adding it to the previous tree with coefficients that minimize the loss of the new tree.
The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.
</para>
<list type='bullet'>
<item>In case of a binary classification problem, the output is converted to a probability by using some form of calibration.</item>
<item>In case of a regression problem, the output is the predicted value of the function.</item>
<item>In case of a ranking problem, the instances are ordered by the output value of the ensemble.</item>
</list>
<a href='https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting'>Wikipedia: Gradient boosting (Gradient tree boosting)</a>.
<a href='http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1013203451'>Greedy function approximation: A gradient boosting machine.</a>.
</remarks>";

public override bool NeedNormalization => false;

public override bool WantCaching => false;
Expand Down
6 changes: 5 additions & 1 deletion src/Microsoft.ML.FastTree/FastTreeClassification.cs
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,11 @@ public void AdjustTreeOutputs(IChannel ch, RegressionTree tree,

public static partial class FastTree
{
[TlcModule.EntryPoint(Name = "Trainers.FastTreeBinaryClassifier", Desc = FastTreeBinaryClassificationTrainer.Summary, UserName = FastTreeBinaryClassificationTrainer.UserNameValue, ShortName = FastTreeBinaryClassificationTrainer.ShortName)]
[TlcModule.EntryPoint(Name = "Trainers.FastTreeBinaryClassifier",
Desc = FastTreeBinaryClassificationTrainer.Summary,
Remarks = FastTreeBinaryClassificationTrainer.Remarks,
UserName = FastTreeBinaryClassificationTrainer.UserNameValue,
ShortName = FastTreeBinaryClassificationTrainer.ShortName)]
public static CommonOutputs.BinaryClassificationOutput TrainBinary(IHostEnvironment env, FastTreeBinaryClassificationTrainer.Arguments input)
{
Contracts.CheckValue(env, nameof(env));
Expand Down
6 changes: 5 additions & 1 deletion src/Microsoft.ML.FastTree/FastTreeRanking.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1096,7 +1096,11 @@ public static FastTreeRankingPredictor Create(IHostEnvironment env, ModelLoadCon

public static partial class FastTree
{
[TlcModule.EntryPoint(Name = "Trainers.FastTreeRanker", Desc = FastTreeRankingTrainer.Summary, UserName = FastTreeRankingTrainer.UserNameValue, ShortName = FastTreeRankingTrainer.ShortName)]
[TlcModule.EntryPoint(Name = "Trainers.FastTreeRanker",
Desc = FastTreeRankingTrainer.Summary,
Remarks = FastTreeRankingTrainer.Remarks,
UserName = FastTreeRankingTrainer.UserNameValue,
ShortName = FastTreeRankingTrainer.ShortName)]
public static CommonOutputs.RankingOutput TrainRanking(IHostEnvironment env, FastTreeRankingTrainer.Arguments input)
{
Contracts.CheckValue(env, nameof(env));
Expand Down
6 changes: 5 additions & 1 deletion src/Microsoft.ML.FastTree/FastTreeRegression.cs
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,11 @@ public static FastTreeRegressionPredictor Create(IHostEnvironment env, ModelLoad

public static partial class FastTree
{
[TlcModule.EntryPoint(Name = "Trainers.FastTreeRegressor", Desc = FastTreeRegressionTrainer.Summary, UserName = FastTreeRegressionTrainer.UserNameValue, ShortName = FastTreeRegressionTrainer.ShortName)]
[TlcModule.EntryPoint(Name = "Trainers.FastTreeRegressor",
Desc = FastTreeRegressionTrainer.Summary,
Remarks = FastTreeRegressionTrainer.Remarks,
UserName = FastTreeRegressionTrainer.UserNameValue,
ShortName = FastTreeRegressionTrainer.ShortName)]
public static CommonOutputs.RegressionOutput TrainRegression(IHostEnvironment env, FastTreeRegressionTrainer.Arguments input)
{
Contracts.CheckValue(env, nameof(env));
Expand Down
12 changes: 9 additions & 3 deletions src/Microsoft.ML.FastTree/FastTreeTweedie.cs
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,11 @@ public sealed partial class FastTreeTweedieTrainer : BoostingFastTreeTrainerBase
{
public const string LoadNameValue = "FastTreeTweedieRegression";
public const string UserNameValue = "FastTree (Boosted Trees) Tweedie Regression";
public const string Summary = "Trains gradient boosted decision trees to fit target values using a Tweedie loss function. This learner " +
"is a generalization of Poisson, compound Poisson, and gamma regression.";
public const string Summary = "Trains gradient boosted decision trees to fit target values using a Tweedie loss function. This learner is a generalization of Poisson, compound Poisson, and gamma regression.";
new public const string Remarks = @"<remarks>
<a href='https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting'>Wikipedia: Gradient boosting (Gradient tree boosting)</a>
<a href='http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1013203451'>Greedy function approximation: A gradient boosting machine</a>
</remarks>";

public const string ShortName = "fttweedie";

Expand Down Expand Up @@ -460,7 +463,10 @@ protected override void Map(ref VBuffer<float> src, ref float dst)

public static partial class FastTree
{
[TlcModule.EntryPoint(Name = "Trainers.FastTreeTweedieRegressor", Desc = FastTreeTweedieTrainer.Summary, UserName = FastTreeTweedieTrainer.UserNameValue, ShortName = FastTreeTweedieTrainer.ShortName)]
[TlcModule.EntryPoint(Name = "Trainers.FastTreeTweedieRegressor",
Desc = FastTreeTweedieTrainer.Summary,
UserName = FastTreeTweedieTrainer.UserNameValue,
ShortName = FastTreeTweedieTrainer.ShortName)]
public static CommonOutputs.RegressionOutput TrainTweedieRegression(IHostEnvironment env, FastTreeTweedieTrainer.Arguments input)
{
Contracts.CheckValue(env, nameof(env));
Expand Down
Loading