You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Xml docs for trainers and a minor infrastructure changes (dotnet#455)
* updating the C# api generator to append the Remarks XML to the generated Summary class XML.
Adding documentation details and references for another batch of the trainers.
* correcting test trigger condition
* typo
* More documentation
* substituting the <see> and <seealso> tags with <a> tags, since there is no official documentation on using href with those tags.
* adressing PR comments.
Fixing new line character in the ep_list and manifest.
* Fixing the list and code generation merges from master
<para>FastTrees is an efficient implementation of the <a href='https://arxiv.org/abs/1505.01866'>MART</a> gradient boosting algorithm.
87
+
Gradient boosting is a machine learning technique for regression problems.
88
+
It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next.
89
+
So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.
90
+
</para>
91
+
<para>
92
+
MART learns an ensemble of regression trees, which is a decision tree with scalar values in its leaves.
93
+
A decision (or regression) tree is a binary tree-like flow chart, where at each interior node one decides which of the two child nodes to continue to based on one of the feature values from the input.
94
+
At each leaf node, a value is returned. In the interior nodes, the decision is based on the test 'x <= v' where x is the value of the feature in the input sample and v is one of the possible values of this feature.
95
+
The functions that can be produced by a regression tree are all the piece-wise constant functions.
96
+
</para>
97
+
<para>
98
+
The ensemble of trees is produced by computing, in each step, a regression tree that approximates the gradient of the loss function, and adding it to the previous tree with coefficients that minimize the loss of the new tree.
99
+
The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.
100
+
</para>
101
+
<list type='bullet'>
102
+
<item>In case of a binary classification problem, the output is converted to a probability by using some form of calibration.</item>
103
+
<item>In case of a regression problem, the output is the predicted value of the function.</item>
104
+
<item>In case of a ranking problem, the instances are ordered by the output value of the ensemble.</item>
105
+
</list>
106
+
<a href='https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting'>Wikipedia: Gradient boosting (Gradient tree boosting)</a>.
107
+
<a href='http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1013203451'>Greedy function approximation: A gradient boosting machine.</a>.
publicconststringSummary="Trains gradient boosted decision trees to fit target values using a Tweedie loss function. This learner "+
40
-
"is a generalization of Poisson, compound Poisson, and gamma regression.";
39
+
publicconststringSummary="Trains gradient boosted decision trees to fit target values using a Tweedie loss function. This learner is a generalization of Poisson, compound Poisson, and gamma regression.";
40
+
newpublicconststringRemarks=@"<remarks>
41
+
<a href='https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting'>Wikipedia: Gradient boosting (Gradient tree boosting)</a>
42
+
<a href='http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1013203451'>Greedy function approximation: A gradient boosting machine</a>
Copy file name to clipboardExpand all lines: src/Microsoft.ML.FastTree/RandomForest.cs
+22Lines changed: 22 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,28 @@ public abstract class RandomForestTrainerBase<TArgs, TPredictor> : FastTreeTrain
12
12
whereTArgs:FastForestArgumentsBase,new()
13
13
whereTPredictor:IPredictorProducing<Float>
14
14
{
15
+
newinternalconststringRemarks=@"<remarks>
16
+
Decision trees are non-parametric models that perform a sequence of simple tests on inputs.
17
+
This decision procedure maps them to outputs found in the training dataset whose inputs were similar to the instance being processed.
18
+
A decision is made at each node of the binary tree data structure based on a measure of similarity that maps each instance recursively through the branches of the tree until the appropriate leaf node is reached and the output decision returned.
19
+
<para>Decision trees have several advantages:</para>
20
+
<list type='bullet'>
21
+
<item>They are efficient in both computation and memory usage during training and prediction. </item>
22
+
<item>They can represent non-linear decision boundaries.</item>
23
+
<item>They perform integrated feature selection and classification. </item>
24
+
<item>They are resilient in the presence of noisy features.</item>
25
+
</list>
26
+
Fast forest is a random forest implementation.
27
+
The model consists of an ensemble of decision trees. Each tree in a decision forest outputs a Gaussian distribution by way of prediction.
28
+
An aggregation is performed over the ensemble of trees to find a Gaussian distribution closest to the combined distribution for all trees in the model.
29
+
This decision forest classifier consists of an ensemble of decision trees.
30
+
Generally, ensemble models provide better coverage and accuracy than single decision trees.
31
+
Each tree in a decision forest outputs a Gaussian distribution.
32
+
<a href='http://en.wikipedia.org/wiki/Random_forest'>Wikipedia: Random forest</a>
0 commit comments