Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 25 additions & 25 deletions docs/machine-learning/resources/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Machine Learning Glossary
description: A glossary of machine learning terms.
author: jralexander
ms.author: johalex
ms.date: 05/15/2018
ms.date: 05/20/2018
ms.topic: conceptual
ms.prod: dotnet-ml
ms.devlang: dotnet
Expand All @@ -15,71 +15,71 @@ The following list is a compilation of important machine learning terms that are

## Accuracy

The proportion of true results to total cases. Ranges from 0 (least accurate) to 1 (most accurate). Accuracy is only one evaluation measure used to score performance of your model and should be considered in conjunction with [precision](#precision) and [recall](#recall).
In [classification](#classification), accuracy is the number of correctly classified items divided by the total number of items in the test set. Ranges from 0 (least accurate) to 1 (most accurate). Accuracy is one of evaluation metrics of the performance of your model. Consider it in conjunction with [precision](#precision), [recall](#recall), and [F-score](#f-score).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but we'll need to break this down into accuracy for binary classification and micro and macro accuracy for multi-classification later.

## Area under the curve (AUC)

A value that represents the area under the curve when false positives are plotted on the x-axis and true positives are plotted on the y-axis. Ranges from 0.5 (worst) to 1 (best).
In [binary classification](#binary-classification), an evaluation metric that is the value of the area under the curve that plots the true positives rate (on the y-axis) against the false positives rate (on the x-axis). Ranges from 0.5 (worst) to 1 (best). Also known as the area under the ROC curve, i.e., receiver operating characteristic curve. For more information, see the [Receiver operating characteristic](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) article on Wikipedia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Binary classification

A [classification](#classification) case where the [label](#label) is only one out of two classes. For more information, see the [Binary classification](https://en.wikipedia.org/wiki/Binary_classification) article on Wikipedia.

## Classification

When the data are being used to predict a category, [supervised learning](#supervised-learning) is also called classification. [Binary classification](#binary-classification) refers to predicting only two categories (for example assigning an image as a picture of either a 'cat' or a 'dog'). [Multiclass classification](#multiclass-classification) refers to predicting multiple categories (for example, when classifying an image as a specific breed of dog).
When the data is used to predict a category, [supervised learning](#supervised-learning) is also called classification. [Binary classification](#binary-classification) refers to predicting only two categories (for example, classifying an image as a picture of either a 'cat' or a 'dog'). [Multiclass classification](#multiclass-classification) refers to predicting multiple categories (for example, when classifying an image as a picture of a specific breed of dog).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Coefficient of determination

A single number that indicates how well data fits a model. A value of 1 means that the model exactly matches the data. A value of 0 means that the data is random or otherwise cannot be fit to the model. This is often referred to as r<sup>2</sup>, R<sup>2</sup>, or r-squared.
In [regression](#regression), an evaluation metric that indicates how well data fits a model. Ranges from 0 to 1. A value of 0 means that the data is random or otherwise cannot be fit to the model. A value of 1 means that the model exactly matches the data. This is often referred to as r<sup>2</sup>, R<sup>2</sup>, or r-squared.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Feature

A measurable property of the phenomenon being measured, typically a numeric (double value). Multiple features are referred to as a **Feature vector** and typically stored as `double[]`. Features define the important characteristics about the phenomenon being measured. For more information see the [Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning)) article on Wikipedia.
A measurable property of the phenomenon being measured, typically a numeric (double) value. Multiple features are referred to as a **Feature vector** and typically stored as `double[]`. Features define the important characteristics of the phenomenon being measured. For more information, see the [Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning)) article on Wikipedia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Feature engineering

Feature engineering is the process of developing software that converts other data types (records, objects, …) into feature vectors. The resulting software performs Feature Extraction. For more information see the [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) article on Wikipedia.
Feature engineering is the process that involves defining a set of [features](#feature) and developing software that produces feature vectors from available phenomenon data, i.e., feature extraction. For more information, see the [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) article on Wikipedia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely better than what we have now, but I think we'll need to write a little more on this eventually.

## F-score

An evaluation metric that balances [precision](#precision) and [recall](#recall).
In [classification](#classification), an evaluation metric that balances [precision](#precision) and [recall](#recall).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Hyperparameter

Parameters of machine learning algorithms. Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. These parameters are called *Hyperparameters* because the process of learning is the process of identifying the right parameters of the prediction function. For example, the coefficients in a linear model or the comparison points in a tree. The process of finding those parameters is governed by the Hyperparameters. For more information see the [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter) article on Wikipedia.
A parameter of a machine learning algorithm. Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. Values of *Hyperparameters* are set before training the model and govern the process of finding the parameters of the prediction function, for example, the comparison points in a decision tree or the weights in a linear regression model. For more information, see the [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) article on Wikipedia.

## Label

The element to be predicted with the machine learning model. For example, the breed of dog or a future stock price.

## Log loss

Loss refers to an algorithm and task-specific measure of accuracy of the model on the training data. Log loss is the logarithm of the same quantity.
In [classification](#classification), an evaluation metric that characterizes the accuracy of a classifier. The smaller log loss is, the more accurate a classifier is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Mean absolute error (MAE)

An evaluation metric that averages all the model errors, where error is the predicted value distance from the true value.
In [regression](#regression), an evaluation metric that is the average of all the model errors, where model error is the distance between the predicted [label](#label) value and the correct label value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

## Model

Traditionally, the parameters for the prediction function. For example, the weights in a linear model or the split points in a tree. In ML.NET, a model contains all the information necessary to predict the label of a domain object (for example, image or text). This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function.
Traditionally, the parameters for the prediction function. For example, the weights in a linear regression model or the split points in a decision tree. In ML.NET, a model contains all the information necessary to predict the [label](#label) of a domain object (for example, image or text). This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

## Multiclass classification

A [classification](#classification) case where the [label](#label) is one out of three or more classes. For more information, see the [Multiclass classification](https://en.wikipedia.org/wiki/Multiclass_classification) article on Wikipedia.

## N-grams
## N-gram

A feature extraction scheme for text data. Any sequence of N words turns into a [feature](#feature).
A feature extraction scheme for text data: any sequence of N words turns into a [feature](#feature) value.

## Numerical feature vectors
## Numerical feature vector

A feature vector consisting only of numerical values. This is similar to `double[]`.
A [feature](#feature) vector consisting only of numerical values. This is similar to `double[]`.

## Pipeline

All of the operations needed to fit a model to a dataset. A pipeline consists of data import, transformation, featurization, and learning steps. Once a pipeline is trained, it turns into a model.
All of the operations needed to fit a model to a data set. A pipeline consists of data import, transformation, featurization, and learning steps. Once a pipeline is trained, it turns into a model.

## Precision

Expand All @@ -91,32 +91,32 @@ In [classification](#classification), the recall for a class is the number of it

## Regression

A supervised machine learning task where the output is a real value, for example, double. Examples include forecasting and predicting stock prices.
A [supervised machine learning](#supervised-learning) task where the output is a real value, for example, double. Examples include predicting stock prices.

## Relative absolute error

An evaluation metric that represents the error as a percentage of the true value.
In [regression](#regression), an evaluation metric that is the sum of all absolute errors divided by the sum of distances between correct [label](#label) values and the average of all correct label values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for these evaluation metrics long term we should probably put in equations, as trying to parse through the wording (although accurate) can get confusing.

Copy link
Contributor Author

@pkulikov pkulikov May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditidugar that's true. I've also thought about inserting the link to related ML.NET API, so folks know immediately what to use from the library for an evaluation metric. And now, API remarks already contain equations (though not in pretty LaTeX format, but fine). If that's a good idea, I'll make a separate PR with such an update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, I haven't found API for the relative absolute error, but other evaluation metrics (RMS, RSquared, recall, etc) are supported by the library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkulikov yes, I think that's a good idea! Although, maybe I am not looking in the right place, but I don't seem to see a lot of the equations laid out in the API reference either (for example, https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.models.classificationmetrics.accuracymacro?view=ml-dotnet).

Either location (API ref or glossary itself) seems fine for the equation, but we should make sure all of them are there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I'll make another PR with API links.
As for the equations, we'll find later a good place for them.

## Relative squared error

An evaluation metric that normalizes the total squared error by dividing by the predicted values' total squared error.
In [regression](#regression), an evaluation metric that is the sum of all squared absolute errors divided by the sum of squared distances between correct [label](#label) values and the average of all correct label values.

## Root of mean squared error (RMSE)

An evaluation metric that measures the average of the squares of the errors, and then takes the root of that value.
In [regression](#regression), an evaluation metric that is the square root of the average of the squares of the errors.

## Supervised machine learning

A subclass of machine learning in which a model is desired which predicts the label for yet-unseen data. Examples include classification, regression, and structured prediction. For more information see the [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) article on Wikipedia.
A subclass of machine learning in which a desired model predicts the label for yet-unseen data. Examples include classification, regression, and structured prediction. For more information, see the [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) article on Wikipedia.

## Training

The process of identifying a model for a given training data set. For a linear model, this means finding the weights. For a tree, it involves the identifying the split points.
The process of identifying a [model](#model) for a given training data set. For a linear model, this means finding the weights. For a tree, it involves the identifying the split points.

## Transform

A pipeline component that transforms data. For example, from text to vector of numbers.
A [pipeline](#pipeline) component that transforms data. For example, from text to vector of numbers.

## Unsupervised machine learning

A subclass of machine learning in which a model is desired which finds hidden (or latent) structure in the data. Examples include clustering, topic modeling, and dimensionality reduction. For more information see the [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) article on Wikipedia.
A subclass of machine learning in which a desired model finds hidden (or latent) structure in data. Examples include clustering, topic modeling, and dimensionality reduction. For more information, see the [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) article on Wikipedia.