diff --git a/docs/machine-learning/resources/glossary.md b/docs/machine-learning/resources/glossary.md index 3b14503eb61e7..da4d9a2db7f4a 100644 --- a/docs/machine-learning/resources/glossary.md +++ b/docs/machine-learning/resources/glossary.md @@ -3,7 +3,7 @@ title: Machine Learning Glossary description: A glossary of machine learning terms. author: jralexander ms.author: johalex -ms.date: 05/15/2018 +ms.date: 05/20/2018 ms.topic: conceptual ms.prod: dotnet-ml ms.devlang: dotnet @@ -15,11 +15,11 @@ The following list is a compilation of important machine learning terms that are ## Accuracy -The proportion of true results to total cases. Ranges from 0 (least accurate) to 1 (most accurate). Accuracy is only one evaluation measure used to score performance of your model and should be considered in conjunction with [precision](#precision) and [recall](#recall). +In [classification](#classification), accuracy is the number of correctly classified items divided by the total number of items in the test set. Ranges from 0 (least accurate) to 1 (most accurate). Accuracy is one of evaluation metrics of the performance of your model. Consider it in conjunction with [precision](#precision), [recall](#recall), and [F-score](#f-score). ## Area under the curve (AUC) -A value that represents the area under the curve when false positives are plotted on the x-axis and true positives are plotted on the y-axis. Ranges from 0.5 (worst) to 1 (best). +In [binary classification](#binary-classification), an evaluation metric that is the value of the area under the curve that plots the true positives rate (on the y-axis) against the false positives rate (on the x-axis). Ranges from 0.5 (worst) to 1 (best). Also known as the area under the ROC curve, i.e., receiver operating characteristic curve. For more information, see the [Receiver operating characteristic](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) article on Wikipedia. ## Binary classification @@ -27,27 +27,27 @@ A [classification](#classification) case where the [label](#label) is only one o ## Classification -When the data are being used to predict a category, [supervised learning](#supervised-learning) is also called classification. [Binary classification](#binary-classification) refers to predicting only two categories (for example assigning an image as a picture of either a 'cat' or a 'dog'). [Multiclass classification](#multiclass-classification) refers to predicting multiple categories (for example, when classifying an image as a specific breed of dog). +When the data is used to predict a category, [supervised learning](#supervised-learning) is also called classification. [Binary classification](#binary-classification) refers to predicting only two categories (for example, classifying an image as a picture of either a 'cat' or a 'dog'). [Multiclass classification](#multiclass-classification) refers to predicting multiple categories (for example, when classifying an image as a picture of a specific breed of dog). ## Coefficient of determination -A single number that indicates how well data fits a model. A value of 1 means that the model exactly matches the data. A value of 0 means that the data is random or otherwise cannot be fit to the model. This is often referred to as r2, R2, or r-squared. +In [regression](#regression), an evaluation metric that indicates how well data fits a model. Ranges from 0 to 1. A value of 0 means that the data is random or otherwise cannot be fit to the model. A value of 1 means that the model exactly matches the data. This is often referred to as r2, R2, or r-squared. ## Feature -A measurable property of the phenomenon being measured, typically a numeric (double value). Multiple features are referred to as a **Feature vector** and typically stored as `double[]`. Features define the important characteristics about the phenomenon being measured. For more information see the [Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning)) article on Wikipedia. +A measurable property of the phenomenon being measured, typically a numeric (double) value. Multiple features are referred to as a **Feature vector** and typically stored as `double[]`. Features define the important characteristics of the phenomenon being measured. For more information, see the [Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning)) article on Wikipedia. ## Feature engineering -Feature engineering is the process of developing software that converts other data types (records, objects, …) into feature vectors. The resulting software performs Feature Extraction. For more information see the [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) article on Wikipedia. +Feature engineering is the process that involves defining a set of [features](#feature) and developing software that produces feature vectors from available phenomenon data, i.e., feature extraction. For more information, see the [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) article on Wikipedia. ## F-score -An evaluation metric that balances [precision](#precision) and [recall](#recall). +In [classification](#classification), an evaluation metric that balances [precision](#precision) and [recall](#recall). ## Hyperparameter -Parameters of machine learning algorithms. Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. These parameters are called *Hyperparameters* because the process of learning is the process of identifying the right parameters of the prediction function. For example, the coefficients in a linear model or the comparison points in a tree. The process of finding those parameters is governed by the Hyperparameters. For more information see the [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter) article on Wikipedia. +A parameter of a machine learning algorithm. Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. Values of *Hyperparameters* are set before training the model and govern the process of finding the parameters of the prediction function, for example, the comparison points in a decision tree or the weights in a linear regression model. For more information, see the [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) article on Wikipedia. ## Label @@ -55,31 +55,31 @@ The element to be predicted with the machine learning model. For example, the br ## Log loss -Loss refers to an algorithm and task-specific measure of accuracy of the model on the training data. Log loss is the logarithm of the same quantity. +In [classification](#classification), an evaluation metric that characterizes the accuracy of a classifier. The smaller log loss is, the more accurate a classifier is. ## Mean absolute error (MAE) -An evaluation metric that averages all the model errors, where error is the predicted value distance from the true value. +In [regression](#regression), an evaluation metric that is the average of all the model errors, where model error is the distance between the predicted [label](#label) value and the correct label value. ## Model -Traditionally, the parameters for the prediction function. For example, the weights in a linear model or the split points in a tree. In ML.NET, a model contains all the information necessary to predict the label of a domain object (for example, image or text). This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function. +Traditionally, the parameters for the prediction function. For example, the weights in a linear regression model or the split points in a decision tree. In ML.NET, a model contains all the information necessary to predict the [label](#label) of a domain object (for example, image or text). This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function. ## Multiclass classification A [classification](#classification) case where the [label](#label) is one out of three or more classes. For more information, see the [Multiclass classification](https://en.wikipedia.org/wiki/Multiclass_classification) article on Wikipedia. -## N-grams +## N-gram -A feature extraction scheme for text data. Any sequence of N words turns into a [feature](#feature). +A feature extraction scheme for text data: any sequence of N words turns into a [feature](#feature) value. -## Numerical feature vectors +## Numerical feature vector -A feature vector consisting only of numerical values. This is similar to `double[]`. +A [feature](#feature) vector consisting only of numerical values. This is similar to `double[]`. ## Pipeline -All of the operations needed to fit a model to a dataset. A pipeline consists of data import, transformation, featurization, and learning steps. Once a pipeline is trained, it turns into a model. +All of the operations needed to fit a model to a data set. A pipeline consists of data import, transformation, featurization, and learning steps. Once a pipeline is trained, it turns into a model. ## Precision @@ -91,32 +91,32 @@ In [classification](#classification), the recall for a class is the number of it ## Regression -A supervised machine learning task where the output is a real value, for example, double. Examples include forecasting and predicting stock prices. +A [supervised machine learning](#supervised-learning) task where the output is a real value, for example, double. Examples include predicting stock prices. ## Relative absolute error -An evaluation metric that represents the error as a percentage of the true value. +In [regression](#regression), an evaluation metric that is the sum of all absolute errors divided by the sum of distances between correct [label](#label) values and the average of all correct label values. ## Relative squared error -An evaluation metric that normalizes the total squared error by dividing by the predicted values' total squared error. +In [regression](#regression), an evaluation metric that is the sum of all squared absolute errors divided by the sum of squared distances between correct [label](#label) values and the average of all correct label values. ## Root of mean squared error (RMSE) -An evaluation metric that measures the average of the squares of the errors, and then takes the root of that value. +In [regression](#regression), an evaluation metric that is the square root of the average of the squares of the errors. ## Supervised machine learning -A subclass of machine learning in which a model is desired which predicts the label for yet-unseen data. Examples include classification, regression, and structured prediction. For more information see the [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) article on Wikipedia. +A subclass of machine learning in which a desired model predicts the label for yet-unseen data. Examples include classification, regression, and structured prediction. For more information, see the [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) article on Wikipedia. ## Training -The process of identifying a model for a given training data set. For a linear model, this means finding the weights. For a tree, it involves the identifying the split points. +The process of identifying a [model](#model) for a given training data set. For a linear model, this means finding the weights. For a tree, it involves the identifying the split points. ## Transform -A pipeline component that transforms data. For example, from text to vector of numbers. +A [pipeline](#pipeline) component that transforms data. For example, from text to vector of numbers. ## Unsupervised machine learning -A subclass of machine learning in which a model is desired which finds hidden (or latent) structure in the data. Examples include clustering, topic modeling, and dimensionality reduction. For more information see the [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) article on Wikipedia. +A subclass of machine learning in which a desired model finds hidden (or latent) structure in data. Examples include clustering, topic modeling, and dimensionality reduction. For more information, see the [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) article on Wikipedia.