================================================================================
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.
source: wikipedia
See the algorithm in action here
media\ media files
gradient-descent.css CSS stylesheet
gradient-descent.js JavaScript file with the source code for the algorithm visualization
index.html webpage demonstrating the algorithm
README.md README file that appears on the website's github page
Data are randomly generated x,y-coordinates using the function slope * x + intercept + error()
where error()
is a zero-mean normal distribution with a
standard deviation of 0.9
.
The hypothesis function has the general form for a straight line:
h(x) = intercept + slope * x
The accuracy of the hypothesis function can be measured with a cost function, which takes an average of all the results of the hypothesis compared to the actual outputs.
J(intercept, slope) = 1 / (2*m) * sum(h(x) - y) ^ 2
This function is also known as the mean squared error.
-
The algorithm is initialized with n random points using the function
y = slope * x + intercept + error()
. Cost function arguments intercept and slope are initialized at 0. -
Arguments intercept and slope are simultaneously computed as:
- intercept := intercept - learning rate * partial derivative of the cost function
- slope := slope - learning rate * partial derivative of the cost function
-
Step 2 is repeated until convergence. Each step makes the regression function more accurate. Convergence is declared when the mean squared error (or cost) between two steps is less than 0.0001.