diff --git a/docs/machine-learning/resources/glossary.md b/docs/machine-learning/resources/glossary.md
index 3b14503eb61e7..da4d9a2db7f4a 100644
--- a/docs/machine-learning/resources/glossary.md
+++ b/docs/machine-learning/resources/glossary.md
@@ -3,7 +3,7 @@ title: Machine Learning Glossary
 description: A glossary of machine learning terms.
 author: jralexander
 ms.author: johalex
-ms.date: 05/15/2018
+ms.date: 05/20/2018
 ms.topic: conceptual
 ms.prod: dotnet-ml
 ms.devlang: dotnet
@@ -15,11 +15,11 @@ The following list is a compilation of important machine learning terms that are
 
 ## Accuracy
 
-The proportion of true results to total cases. Ranges from 0 (least accurate) to 1 (most accurate). Accuracy is only one evaluation measure used to score performance of your model and should be considered in conjunction with [precision](#precision) and [recall](#recall).
+In [classification](#classification), accuracy is the number of correctly classified items divided by the total number of items in the test set. Ranges from 0 (least accurate) to 1 (most accurate). Accuracy is one of evaluation metrics of the performance of your model. Consider it in conjunction with [precision](#precision), [recall](#recall), and [F-score](#f-score).
 
 ## Area under the curve (AUC)
 
-A value that represents the area under the curve when false positives are plotted on the x-axis and true positives are plotted on the y-axis. Ranges from 0.5 (worst) to 1 (best).
+In [binary classification](#binary-classification), an evaluation metric that is the value of the area under the curve that plots the true positives rate (on the y-axis) against the false positives rate (on the x-axis). Ranges from 0.5 (worst) to 1 (best). Also known as the area under the ROC curve, i.e., receiver operating characteristic curve. For more information, see the [Receiver operating characteristic](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) article on Wikipedia.
 
 ## Binary classification
 
@@ -27,27 +27,27 @@ A [classification](#classification) case where the [label](#label) is only one o
 
 ## Classification
 
-When the data are being used to predict a category, [supervised learning](#supervised-learning) is also called classification. [Binary classification](#binary-classification) refers to predicting only two categories (for example assigning an image as a picture of either a 'cat' or a 'dog'). [Multiclass classification](#multiclass-classification) refers to predicting multiple categories (for example, when classifying an image as a specific breed of dog).
+When the data is used to predict a category, [supervised learning](#supervised-learning) is also called classification. [Binary classification](#binary-classification) refers to predicting only two categories (for example, classifying an image as a picture of either a 'cat' or a 'dog'). [Multiclass classification](#multiclass-classification) refers to predicting multiple categories (for example, when classifying an image as a picture of a specific breed of dog).
 
 ## Coefficient of determination
 
-A single number that indicates how well data fits a model. A value of 1 means that the model exactly matches the data. A value of 0 means that the data is random or otherwise cannot be fit to the model. This is often referred to as r<sup>2</sup>, R<sup>2</sup>, or r-squared.
+In [regression](#regression), an evaluation metric that indicates how well data fits a model. Ranges from 0 to 1. A value of 0 means that the data is random or otherwise cannot be fit to the model. A value of 1 means that the model exactly matches the data. This is often referred to as r<sup>2</sup>, R<sup>2</sup>, or r-squared.
 
 ## Feature
 
-A measurable property of the phenomenon being measured, typically a numeric (double value). Multiple features are referred to as a **Feature vector** and typically stored as `double[]`. Features define the important characteristics about the phenomenon being measured. For more information see the [Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning)) article on Wikipedia.
+A measurable property of the phenomenon being measured, typically a numeric (double) value. Multiple features are referred to as a **Feature vector** and typically stored as `double[]`. Features define the important characteristics of the phenomenon being measured. For more information, see the [Feature](https://en.wikipedia.org/wiki/Feature_(machine_learning)) article on Wikipedia.
 
 ## Feature engineering
 
-Feature engineering is the process of developing software that converts other data types (records, objects, …) into feature vectors. The resulting software performs Feature Extraction. For more information see the [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) article on Wikipedia.
+Feature engineering is the process that involves defining a set of [features](#feature) and developing software that produces feature vectors from available phenomenon data, i.e., feature extraction. For more information, see the [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) article on Wikipedia.
 
 ## F-score
 
-An evaluation metric that balances [precision](#precision) and [recall](#recall).
+In [classification](#classification), an evaluation metric that balances [precision](#precision) and [recall](#recall).
 
 ## Hyperparameter
 
-Parameters of machine learning algorithms. Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. These parameters are called *Hyperparameters* because the process of learning is the process of identifying the right parameters of the prediction function. For example, the coefficients in a linear model or the comparison points in a tree. The process of finding those parameters is governed by the Hyperparameters. For more information see the [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter) article on Wikipedia.
+A parameter of a machine learning algorithm. Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. Values of *Hyperparameters* are set before training the model and govern the process of finding the parameters of the prediction function, for example, the comparison points in a decision tree or the weights in a linear regression model. For more information, see the [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) article on Wikipedia.
 
 ## Label
 
@@ -55,31 +55,31 @@ The element to be predicted with the machine learning model. For example, the br
 
 ## Log loss
 
-Loss refers to an algorithm and task-specific measure of accuracy of the model on the training data. Log loss is the logarithm of the same quantity.
+In [classification](#classification), an evaluation metric that characterizes the accuracy of a classifier. The smaller log loss is, the more accurate a classifier is.
 
 ## Mean absolute error (MAE)
 
-An evaluation metric that averages all the model errors, where error is the predicted value distance from the true value.
+In [regression](#regression), an evaluation metric that is the average of all the model errors, where model error is the distance between the predicted [label](#label) value and the correct label value.
 
 ## Model
 
-Traditionally, the parameters for the prediction function. For example, the weights in a linear model or the split points in a tree. In ML.NET, a model contains all the information necessary to predict the label of a domain object (for example, image or text). This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function.
+Traditionally, the parameters for the prediction function. For example, the weights in a linear regression model or the split points in a decision tree. In ML.NET, a model contains all the information necessary to predict the [label](#label) of a domain object (for example, image or text). This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function.
 
 ## Multiclass classification
 
 A [classification](#classification) case where the [label](#label) is one out of three or more classes. For more information, see the [Multiclass classification](https://en.wikipedia.org/wiki/Multiclass_classification) article on Wikipedia.
 
-## N-grams
+## N-gram
 
-A feature extraction scheme for text data. Any sequence of N words turns into a [feature](#feature).
+A feature extraction scheme for text data: any sequence of N words turns into a [feature](#feature) value.
 
-## Numerical feature vectors
+## Numerical feature vector
 
-A feature vector consisting only of numerical values. This is similar to `double[]`.
+A [feature](#feature) vector consisting only of numerical values. This is similar to `double[]`.
 
 ## Pipeline
 
-All of the operations needed to fit a model to a dataset. A pipeline consists of data import, transformation, featurization, and learning steps. Once a pipeline is trained, it turns into a model.
+All of the operations needed to fit a model to a data set. A pipeline consists of data import, transformation, featurization, and learning steps. Once a pipeline is trained, it turns into a model.
 
 ## Precision
 
@@ -91,32 +91,32 @@ In [classification](#classification), the recall for a class is the number of it
 
 ## Regression
 
-A supervised machine learning task where the output is a real value, for example, double. Examples include forecasting and predicting stock prices.
+A [supervised machine learning](#supervised-learning) task where the output is a real value, for example, double. Examples include predicting stock prices.
 
 ## Relative absolute error
 
-An evaluation metric that represents the error as a percentage of the true value.
+In [regression](#regression), an evaluation metric that is the sum of all absolute errors divided by the sum of distances between correct [label](#label) values and the average of all correct label values.
 
 ## Relative squared error
 
-An evaluation metric that normalizes the total squared error by dividing by the predicted values' total squared error.
+In [regression](#regression), an evaluation metric that is the sum of all squared absolute errors divided by the sum of squared distances between correct [label](#label) values and the average of all correct label values.
 
 ## Root of mean squared error (RMSE)
 
-An evaluation metric that measures the average of the squares of the errors, and then takes the root of that value.
+In [regression](#regression), an evaluation metric that is the square root of the average of the squares of the errors.
 
 ## Supervised machine learning
 
-A subclass of machine learning in which a model is desired which predicts the label for yet-unseen data. Examples include classification, regression, and structured prediction. For more information see the  [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) article on Wikipedia.
+A subclass of machine learning in which a desired model predicts the label for yet-unseen data. Examples include classification, regression, and structured prediction. For more information, see the  [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) article on Wikipedia.
 
 ## Training
 
-The process of identifying a model for a given training data set. For a linear model, this means finding the weights. For a tree, it involves the identifying the split points.
+The process of identifying a [model](#model) for a given training data set. For a linear model, this means finding the weights. For a tree, it involves the identifying the split points.
 
 ## Transform
 
-A pipeline component that transforms data. For example, from text to vector of numbers.
+A [pipeline](#pipeline) component that transforms data. For example, from text to vector of numbers.
 
 ## Unsupervised machine learning
 
-A subclass of machine learning in which a model is desired which finds hidden (or latent) structure in the data. Examples include clustering, topic modeling, and dimensionality reduction. For more information see the [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) article on Wikipedia.
+A subclass of machine learning in which a desired model finds hidden (or latent) structure in data. Examples include clustering, topic modeling, and dimensionality reduction. For more information, see the [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) article on Wikipedia.