diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Data/ex4data1.mat b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Data/ex4data1.mat
new file mode 100644
index 000000000..371bd0c09
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Data/ex4data1.mat differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Data/ex4weights.mat b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Data/ex4weights.mat
new file mode 100644
index 000000000..ace2a090d
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Data/ex4weights.mat differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Figures/ex4-backpropagation.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Figures/ex4-backpropagation.png
new file mode 100644
index 000000000..62e1861f3
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Figures/ex4-backpropagation.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Figures/neural_network.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Figures/neural_network.png
new file mode 100644
index 000000000..140fdb012
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/Figures/neural_network.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/exercise4.ipynb b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/exercise4.ipynb
new file mode 100644
index 000000000..d8ebee0cf
--- /dev/null
+++ b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/exercise4.ipynb	
@@ -0,0 +1,924 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Programming Exercise 4:  Neural Networks Learning\n",
+    "\n",
+    "## Introduction\n",
+    "\n",
+    "In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written digit recognition. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.\n",
+    "\n",
+    "\n",
+    "All the information you need for solving this assignment is in this notebook, and all the code you will be implementing will take place within this notebook. The assignment can be promptly submitted to the coursera grader directly from this notebook (code and instructions are included below).\n",
+    "\n",
+    "Before we begin with the exercises, we need to import all libraries required for this programming exercise. Throughout the course, we will be using [`numpy`](http://www.numpy.org/) for all arrays and matrix operations, [`matplotlib`](https://matplotlib.org/) for plotting, and [`scipy`](https://docs.scipy.org/doc/scipy/reference/) for scientific and numerical computation functions and tools. You can find instructions on how to install required libraries in the README file in the [github repository](https://github.com/dibgerge/ml-coursera-python-assignments)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# used for manipulating directory paths\n",
+    "import os\n",
+    "\n",
+    "# Scientific and vector computation for python\n",
+    "import numpy as np\n",
+    "\n",
+    "# Plotting library\n",
+    "from matplotlib import pyplot\n",
+    "\n",
+    "# Optimization module in scipy\n",
+    "from scipy import optimize\n",
+    "\n",
+    "# will be used to load MATLAB mat datafile format\n",
+    "from scipy.io import loadmat\n",
+    "\n",
+    "# library written for this exercise providing additional functions for assignment submission, and others\n",
+    "import utils\n",
+    "\n",
+    "# define the submission/grader object for this exercise\n",
+    "grader = utils.Grader()\n",
+    "\n",
+    "# tells matplotlib to embed plots within the notebook\n",
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Submission and Grading\n",
+    "\n",
+    "\n",
+    "After completing each part of the assignment, be sure to submit your solutions to the grader. The following is a breakdown of how each part of this exercise is scored.\n",
+    "\n",
+    "\n",
+    "| Section | Part                                             | Submission function | Points \n",
+    "| :-      |:-                                                | :-                  | :-:    \n",
+    "| 1       | [Feedforward and Cost Function](#section1)                    | [`nnCostFunction`](#nnCostFunction)   | 30     \n",
+    "| 2       | [Regularized Cost Function](#section2)                        | [`nnCostFunction`](#nnCostFunction)   | 15     \n",
+    "| 3       | [Sigmoid Gradient](#section3)                                 | [`sigmoidGradient`](#sigmoidGradient) | 5      \n",
+    "| 4       | [Neural Net Gradient Function (Backpropagation)](#section4)   | [`nnCostFunction`](#nnCostFunction)   | 40     \n",
+    "| 5       | [Regularized Gradient](#section5)                             | [`nnCostFunction`](#nnCostFunction)   |10     \n",
+    "|         | Total Points                                     |    | 100    \n",
+    "\n",
+    "\n",
+    "You are allowed to submit your solutions multiple times, and we will take only the highest score into consideration.\n",
+    "\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
+    "At the end of each section in this notebook, we have a cell which contains code for submitting the solutions thus far to the grader. Execute the cell to see your score up to the current section. For all your work to be submitted properly, you must execute those cells at least once.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Neural Networks\n",
+    "\n",
+    "In the previous exercise, you implemented feedforward propagation for neural networks and used it to predict handwritten digits with the weights we provided. In this exercise, you will implement the backpropagation algorithm to learn the parameters for the neural network.\n",
+    "\n",
+    "We start the exercise by first loading the dataset. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#  training data stored in arrays X, y\n",
+    "data = loadmat(os.path.join('Data', 'ex4data1.mat'))\n",
+    "X, y = data['X'], data['y'].ravel()\n",
+    "\n",
+    "# set the zero digit to 0, rather than its mapped 10 in this dataset\n",
+    "# This is an artifact due to the fact that this dataset was used in \n",
+    "# MATLAB where there is no index 0\n",
+    "y[y == 10] = 0\n",
+    "\n",
+    "# Number of training examples\n",
+    "m = y.size"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.1 Visualizing the data\n",
+    "\n",
+    "You will begin by visualizing a subset of the training set, using the function `displayData`, which is the same function we used in Exercise 3. It is provided in the `utils.py` file for this assignment as well. The dataset is also the same one you used in the previous exercise.\n",
+    "\n",
+    "There are 5000 training examples in `ex4data1.mat`, where each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. Each\n",
+    "of these training examples becomes a single row in our data matrix $X$. This gives us a 5000 by 400 matrix $X$ where every row is a training example for a handwritten digit image.\n",
+    "\n",
+    "$$ X = \\begin{bmatrix} - \\left(x^{(1)} \\right)^T - \\\\\n",
+    "- \\left(x^{(2)} \\right)^T - \\\\\n",
+    "\\vdots \\\\\n",
+    "- \\left(x^{(m)} \\right)^T - \\\\\n",
+    "\\end{bmatrix}\n",
+    "$$\n",
+    "\n",
+    "The second part of the training set is a 5000-dimensional vector `y` that contains labels for the training set. \n",
+    "The following cell randomly selects 100 images from the dataset and plots them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Randomly select 100 data points to display\n",
+    "rand_indices = np.random.choice(m, 100, replace=False)\n",
+    "sel = X[rand_indices, :]\n",
+    "\n",
+    "utils.displayData(sel)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.2 Model representation\n",
+    "\n",
+    "Our neural network is shown in the following figure.\n",
+    "\n",
+    "![](Figures/neural_network.png)\n",
+    "\n",
+    "It has 3 layers - an input layer, a hidden layer and an output layer. Recall that our inputs are pixel values\n",
+    "of digit images. Since the images are of size $20 \\times 20$, this gives us 400 input layer units (not counting the extra bias unit which always outputs +1). The training data was loaded into the variables `X` and `y` above.\n",
+    "\n",
+    "You have been provided with a set of network parameters ($\\Theta^{(1)}, \\Theta^{(2)}$) already trained by us. These are stored in `ex4weights.mat` and will be loaded in the next cell of this notebook into `Theta1` and `Theta2`. The parameters have dimensions that are sized for a neural network with 25 units in the second layer and 10 output units (corresponding to the 10 digit classes)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Setup the parameters you will use for this exercise\n",
+    "input_layer_size  = 400  # 20x20 Input Images of Digits\n",
+    "hidden_layer_size = 25   # 25 hidden units\n",
+    "num_labels = 10          # 10 labels, from 0 to 9\n",
+    "\n",
+    "# Load the weights into variables Theta1 and Theta2\n",
+    "weights = loadmat(os.path.join('Data', 'ex4weights.mat'))\n",
+    "\n",
+    "# Theta1 has size 25 x 401\n",
+    "# Theta2 has size 10 x 26\n",
+    "Theta1, Theta2 = weights['Theta1'], weights['Theta2']\n",
+    "\n",
+    "# swap first and last columns of Theta2, due to legacy from MATLAB indexing, \n",
+    "# since the weight file ex3weights.mat was saved based on MATLAB indexing\n",
+    "Theta2 = np.roll(Theta2, 1, axis=0)\n",
+    "\n",
+    "# Unroll parameters \n",
+    "nn_params = np.concatenate([Theta1.ravel(), Theta2.ravel()])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section1\"></a>\n",
+    "### 1.3 Feedforward and cost function\n",
+    "\n",
+    "Now you will implement the cost function and gradient for the neural network. First, complete the code for the function `nnCostFunction` in the next cell to return the cost.\n",
+    "\n",
+    "Recall that the cost function for the neural network (without regularization) is:\n",
+    "\n",
+    "$$ J(\\theta) = \\frac{1}{m} \\sum_{i=1}^{m}\\sum_{k=1}^{K} \\left[ - y_k^{(i)} \\log \\left( \\left( h_\\theta \\left( x^{(i)} \\right) \\right)_k \\right) - \\left( 1 - y_k^{(i)} \\right) \\log \\left( 1 - \\left( h_\\theta \\left( x^{(i)} \\right) \\right)_k \\right) \\right]$$\n",
+    "\n",
+    "where $h_\\theta \\left( x^{(i)} \\right)$ is computed as shown in the neural network figure above, and K = 10 is the total number of possible labels. Note that $h_\\theta(x^{(i)})_k = a_k^{(3)}$ is the activation (output\n",
+    "value) of the $k^{th}$ output unit. Also, recall that whereas the original labels (in the variable y) were 0, 1, ..., 9, for the purpose of training a neural network, we need to encode the labels as vectors containing only values 0 or 1, so that\n",
+    "\n",
+    "$$ y = \n",
+    "\\begin{bmatrix} 1 \\\\ 0 \\\\ 0 \\\\\\vdots \\\\ 0 \\end{bmatrix}, \\quad\n",
+    "\\begin{bmatrix} 0 \\\\ 1 \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{bmatrix}, \\quad \\cdots  \\quad \\text{or} \\qquad\n",
+    "\\begin{bmatrix} 0 \\\\ 0 \\\\ 0 \\\\ \\vdots \\\\ 1 \\end{bmatrix}.\n",
+    "$$\n",
+    "\n",
+    "For example, if $x^{(i)}$ is an image of the digit 5, then the corresponding $y^{(i)}$ (that you should use with the cost function) should be a 10-dimensional vector with $y_5 = 1$, and the other elements equal to 0.\n",
+    "\n",
+    "You should implement the feedforward computation that computes $h_\\theta(x^{(i)})$ for every example $i$ and sum the cost over all examples. **Your code should also work for a dataset of any size, with any number of labels** (you can assume that there are always at least $K \\ge 3$ labels).\n",
+    "\n",
+    "<div class=\"alert alert-box alert-warning\">\n",
+    "**Implementation Note:** The matrix $X$ contains the examples in rows (i.e., X[i,:] is the i-th training example $x^{(i)}$, expressed as a $n \\times 1$ vector.) When you complete the code in `nnCostFunction`, you will need to add the column of 1’s to the X matrix. The parameters for each unit in the neural network is represented in Theta1 and Theta2 as one row. Specifically, the first row of Theta1 corresponds to the first hidden unit in the second layer. You can use a for-loop over the examples to compute the cost.\n",
+    "</div>\n",
+    "<a id=\"nnCostFunction\"></a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def nnCostFunction(nn_params,\n",
+    "                   input_layer_size,\n",
+    "                   hidden_layer_size,\n",
+    "                   num_labels,\n",
+    "                   X, y, lambda_=0.0):\n",
+    "    \"\"\"\n",
+    "    Implements the neural network cost function and gradient for a two layer neural \n",
+    "    network which performs classification. \n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    nn_params : array_like\n",
+    "        The parameters for the neural network which are \"unrolled\" into \n",
+    "        a vector. This needs to be converted back into the weight matrices Theta1\n",
+    "        and Theta2.\n",
+    "    \n",
+    "    input_layer_size : int\n",
+    "        Number of features for the input layer. \n",
+    "    \n",
+    "    hidden_layer_size : int\n",
+    "        Number of hidden units in the second layer.\n",
+    "    \n",
+    "    num_labels : int\n",
+    "        Total number of labels, or equivalently number of units in output layer. \n",
+    "    \n",
+    "    X : array_like\n",
+    "        Input dataset. A matrix of shape (m x input_layer_size).\n",
+    "    \n",
+    "    y : array_like\n",
+    "        Dataset labels. A vector of shape (m,).\n",
+    "    \n",
+    "    lambda_ : float, optional\n",
+    "        Regularization parameter.\n",
+    " \n",
+    "    Returns\n",
+    "    -------\n",
+    "    J : float\n",
+    "        The computed value for the cost function at the current weight values.\n",
+    "    \n",
+    "    grad : array_like\n",
+    "        An \"unrolled\" vector of the partial derivatives of the concatenatation of\n",
+    "        neural network weights Theta1 and Theta2.\n",
+    "    \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    You should complete the code by working through the following parts.\n",
+    "    \n",
+    "    - Part 1: Feedforward the neural network and return the cost in the \n",
+    "              variable J. After implementing Part 1, you can verify that your\n",
+    "              cost function computation is correct by verifying the cost\n",
+    "              computed in the following cell.\n",
+    "    \n",
+    "    - Part 2: Implement the backpropagation algorithm to compute the gradients\n",
+    "              Theta1_grad and Theta2_grad. You should return the partial derivatives of\n",
+    "              the cost function with respect to Theta1 and Theta2 in Theta1_grad and\n",
+    "              Theta2_grad, respectively. After implementing Part 2, you can check\n",
+    "              that your implementation is correct by running checkNNGradients provided\n",
+    "              in the utils.py module.\n",
+    "    \n",
+    "              Note: The vector y passed into the function is a vector of labels\n",
+    "                    containing values from 0..K-1. You need to map this vector into a \n",
+    "                    binary vector of 1's and 0's to be used with the neural network\n",
+    "                    cost function.\n",
+    "     \n",
+    "              Hint: We recommend implementing backpropagation using a for-loop\n",
+    "                    over the training examples if you are implementing it for the \n",
+    "                    first time.\n",
+    "    \n",
+    "    - Part 3: Implement regularization with the cost function and gradients.\n",
+    "    \n",
+    "              Hint: You can implement this around the code for\n",
+    "                    backpropagation. That is, you can compute the gradients for\n",
+    "                    the regularization separately and then add them to Theta1_grad\n",
+    "                    and Theta2_grad from Part 2.\n",
+    "    \n",
+    "    Note \n",
+    "    ----\n",
+    "    We have provided an implementation for the sigmoid function in the file \n",
+    "    `utils.py` accompanying this assignment.\n",
+    "    \"\"\"\n",
+    "    # Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices\n",
+    "    # for our 2 layer neural network\n",
+    "    Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],\n",
+    "                        (hidden_layer_size, (input_layer_size + 1)))\n",
+    "\n",
+    "    Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],\n",
+    "                        (num_labels, (hidden_layer_size + 1)))\n",
+    "\n",
+    "    # Setup some useful variables\n",
+    "    m = y.size\n",
+    "         \n",
+    "    # You need to return the following variables correctly \n",
+    "    J = 0\n",
+    "    Theta1_grad = np.zeros(Theta1.shape)\n",
+    "    Theta2_grad = np.zeros(Theta2.shape)\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "\n",
+    "    \n",
+    "    \n",
+    "    # ================================================================\n",
+    "    # Unroll gradients\n",
+    "    # grad = np.concatenate([Theta1_grad.ravel(order=order), Theta2_grad.ravel(order=order)])\n",
+    "    grad = np.concatenate([Theta1_grad.ravel(), Theta2_grad.ravel()])\n",
+    "\n",
+    "    return J, grad"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"alert alert-box alert-warning\">\n",
+    "Use the following links to go back to the different parts of this exercise that require to modify the function `nnCostFunction`.<br>\n",
+    "\n",
+    "Back to:\n",
+    "- [Feedforward and cost function](#section1)\n",
+    "- [Regularized cost](#section2)\n",
+    "- [Neural Network Gradient (Backpropagation)](#section4)\n",
+    "- [Regularized Gradient](#section5)\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you are done, call your `nnCostFunction` using the loaded set of parameters for `Theta1` and `Theta2`. You should see that the cost is about 0.287629."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lambda_ = 0\n",
+    "J, _ = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,\n",
+    "                   num_labels, X, y, lambda_)\n",
+    "print('Cost at parameters (loaded from ex4weights): %.6f ' % J)\n",
+    "print('The cost should be about                   : 0.287629.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader = utils.Grader()\n",
+    "grader[1] = nnCostFunction\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section2\"></a>\n",
+    "### 1.4 Regularized cost function\n",
+    "\n",
+    "The cost function for neural networks with regularization is given by:\n",
+    "\n",
+    "\n",
+    "$$ J(\\theta) = \\frac{1}{m} \\sum_{i=1}^{m}\\sum_{k=1}^{K} \\left[ - y_k^{(i)} \\log \\left( \\left( h_\\theta \\left( x^{(i)} \\right) \\right)_k \\right) - \\left( 1 - y_k^{(i)} \\right) \\log \\left( 1 - \\left( h_\\theta \\left( x^{(i)} \\right) \\right)_k \\right) \\right] + \\frac{\\lambda}{2 m} \\left[ \\sum_{j=1}^{25} \\sum_{k=1}^{400} \\left( \\Theta_{j,k}^{(1)} \\right)^2 + \\sum_{j=1}^{10} \\sum_{k=1}^{25} \\left( \\Theta_{j,k}^{(2)} \\right)^2 \\right] $$\n",
+    "\n",
+    "You can assume that the neural network will only have 3 layers - an input layer, a hidden layer and an output layer. However, your code should work for any number of input units, hidden units and outputs units. While we\n",
+    "have explicitly listed the indices above for $\\Theta^{(1)}$ and $\\Theta^{(2)}$ for clarity, do note that your code should in general work with $\\Theta^{(1)}$ and $\\Theta^{(2)}$ of any size. Note that you should not be regularizing the terms that correspond to the bias. For the matrices `Theta1` and `Theta2`, this corresponds to the first column of each matrix. You should now add regularization to your cost function. Notice that you can first compute the unregularized cost function $J$ using your existing `nnCostFunction` and then later add the cost for the regularization terms.\n",
+    "\n",
+    "[Click here to go back to `nnCostFunction` for editing.](#nnCostFunction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you are done, the next cell will call your `nnCostFunction` using the loaded set of parameters for `Theta1` and `Theta2`, and $\\lambda = 1$. You should see that the cost is about 0.383770."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Weight regularization parameter (we set this to 1 here).\n",
+    "lambda_ = 1\n",
+    "J, _ = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,\n",
+    "                      num_labels, X, y, lambda_)\n",
+    "\n",
+    "print('Cost at parameters (loaded from ex4weights): %.6f' % J)\n",
+    "print('This value should be about                 : 0.383770.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[2] = nnCostFunction\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2 Backpropagation\n",
+    "\n",
+    "In this part of the exercise, you will implement the backpropagation algorithm to compute the gradient for the neural network cost function. You will need to update the function `nnCostFunction` so that it returns an appropriate value for `grad`. Once you have computed the gradient, you will be able to train the neural network by minimizing the cost function $J(\\theta)$ using an advanced optimizer such as `scipy`'s `optimize.minimize`.\n",
+    "You will first implement the backpropagation algorithm to compute the gradients for the parameters for the (unregularized) neural network. After you have verified that your gradient computation for the unregularized case is correct, you will implement the gradient for the regularized neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section3\"></a>\n",
+    "### 2.1 Sigmoid Gradient\n",
+    "\n",
+    "To help you get started with this part of the exercise, you will first implement\n",
+    "the sigmoid gradient function. The gradient for the sigmoid function can be\n",
+    "computed as\n",
+    "\n",
+    "$$ g'(z) = \\frac{d}{dz} g(z) = g(z)\\left(1-g(z)\\right) $$\n",
+    "\n",
+    "where\n",
+    "\n",
+    "$$ \\text{sigmoid}(z) = g(z) = \\frac{1}{1 + e^{-z}} $$\n",
+    "\n",
+    "Now complete the implementation of `sigmoidGradient` in the next cell.\n",
+    "<a id=\"sigmoidGradient\"></a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sigmoidGradient(z):\n",
+    "    \"\"\"\n",
+    "    Computes the gradient of the sigmoid function evaluated at z. \n",
+    "    This should work regardless if z is a matrix or a vector. \n",
+    "    In particular, if z is a vector or matrix, you should return\n",
+    "    the gradient for each element.\n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    z : array_like\n",
+    "        A vector or matrix as input to the sigmoid function. \n",
+    "    \n",
+    "    Returns\n",
+    "    --------\n",
+    "    g : array_like\n",
+    "        Gradient of the sigmoid function. Has the same shape as z. \n",
+    "    \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    Compute the gradient of the sigmoid function evaluated at\n",
+    "    each value of z (z can be a matrix, vector or scalar).\n",
+    "    \n",
+    "    Note\n",
+    "    ----\n",
+    "    We have provided an implementation of the sigmoid function \n",
+    "    in `utils.py` file accompanying this assignment.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    g = np.zeros(z.shape)\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # =============================================================\n",
+    "    return g"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When you are done, the following cell call `sigmoidGradient` on a given vector `z`. Try testing a few values by calling `sigmoidGradient(z)`. For large values (both positive and negative) of z, the gradient should be close to 0. When $z = 0$, the gradient should be exactly 0.25. Your code should also work with vectors and matrices. For a matrix, your function should perform the sigmoid gradient function on every element."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z = np.array([-1, -0.5, 0, 0.5, 1])\n",
+    "g = sigmoidGradient(z)\n",
+    "print('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:\\n  ')\n",
+    "print(g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[3] = sigmoidGradient\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2 Random Initialization\n",
+    "\n",
+    "When training neural networks, it is important to randomly initialize the parameters for symmetry breaking. One effective strategy for random initialization is to randomly select values for $\\Theta^{(l)}$ uniformly in the range $[-\\epsilon_{init}, \\epsilon_{init}]$. You should use $\\epsilon_{init} = 0.12$. This range of values ensures that the parameters are kept small and makes the learning more efficient.\n",
+    "\n",
+    "<div class=\"alert alert-box alert-warning\">\n",
+    "One effective strategy for choosing $\\epsilon_{init}$ is to base it on the number of units in the network. A good choice of $\\epsilon_{init}$ is $\\epsilon_{init} = \\frac{\\sqrt{6}}{\\sqrt{L_{in} + L_{out}}}$ where $L_{in} = s_l$ and $L_{out} = s_{l+1}$ are the number of units in the layers adjacent to $\\Theta^{l}$.\n",
+    "</div>\n",
+    "\n",
+    "Your job is to complete the function `randInitializeWeights` to initialize the weights for $\\Theta$. Modify the function by filling in the following code:\n",
+    "\n",
+    "```python\n",
+    "# Randomly initialize the weights to small values\n",
+    "W = np.random.rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init\n",
+    "```\n",
+    "Note that we give the function an argument for $\\epsilon$ with default value `epsilon_init = 0.12`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def randInitializeWeights(L_in, L_out, epsilon_init=0.12):\n",
+    "    \"\"\"\n",
+    "    Randomly initialize the weights of a layer in a neural network.\n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    L_in : int\n",
+    "        Number of incomming connections.\n",
+    "    \n",
+    "    L_out : int\n",
+    "        Number of outgoing connections. \n",
+    "    \n",
+    "    epsilon_init : float, optional\n",
+    "        Range of values which the weight can take from a uniform \n",
+    "        distribution.\n",
+    "    \n",
+    "    Returns\n",
+    "    -------\n",
+    "    W : array_like\n",
+    "        The weight initialiatized to random values.  Note that W should\n",
+    "        be set to a matrix of size(L_out, 1 + L_in) as\n",
+    "        the first column of W handles the \"bias\" terms.\n",
+    "        \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    Initialize W randomly so that we break the symmetry while training\n",
+    "    the neural network. Note that the first column of W corresponds \n",
+    "    to the parameters for the bias unit.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # You need to return the following variables correctly \n",
+    "    W = np.zeros((L_out, 1 + L_in))\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # ============================================================\n",
+    "    return W"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You do not need to submit any code for this part of the exercise.*\n",
+    "\n",
+    "Execute the following cell to initialize the weights for the 2 layers in the neural network using the `randInitializeWeights` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('Initializing Neural Network Parameters ...')\n",
+    "\n",
+    "initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)\n",
+    "initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels)\n",
+    "\n",
+    "# Unroll parameters\n",
+    "initial_nn_params = np.concatenate([initial_Theta1.ravel(), initial_Theta2.ravel()], axis=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section4\"></a>\n",
+    "### 2.4 Backpropagation\n",
+    "\n",
+    "![](Figures/ex4-backpropagation.png)\n",
+    "\n",
+    "Now, you will implement the backpropagation algorithm. Recall that the intuition behind the backpropagation algorithm is as follows. Given a training example $(x^{(t)}, y^{(t)})$, we will first run a “forward pass” to compute all the activations throughout the network, including the output value of the hypothesis $h_\\theta(x)$. Then, for each node $j$ in layer $l$, we would like to compute an “error term” $\\delta_j^{(l)}$ that measures how much that node was “responsible” for any errors in our output.\n",
+    "\n",
+    "For an output node, we can directly measure the difference between the network’s activation and the true target value, and use that to define $\\delta_j^{(3)}$ (since layer 3 is the output layer). For the hidden units, you will compute $\\delta_j^{(l)}$ based on a weighted average of the error terms of the nodes in layer $(l+1)$. In detail, here is the backpropagation algorithm (also depicted in the figure above). You should implement steps 1 to 4 in a loop that processes one example at a time. Concretely, you should implement a for-loop `for t in range(m)` and place steps 1-4 below inside the for-loop, with the $t^{th}$ iteration performing the calculation on the $t^{th}$ training example $(x^{(t)}, y^{(t)})$. Step 5 will divide the accumulated gradients by $m$ to obtain the gradients for the neural network cost function.\n",
+    "\n",
+    "1. Set the input layer’s values $(a^{(1)})$ to the $t^{th }$training example $x^{(t)}$. Perform a feedforward pass, computing the activations $(z^{(2)}, a^{(2)}, z^{(3)}, a^{(3)})$ for layers 2 and 3. Note that you need to add a `+1` term to ensure that the vectors of activations for layers $a^{(1)}$ and $a^{(2)}$ also include the bias unit. In `numpy`, if a 1 is a column matrix, adding one corresponds to `a_1 = np.concatenate([np.ones((m, 1)), a_1], axis=1)`.\n",
+    "\n",
+    "1. For each output unit $k$ in layer 3 (the output layer), set \n",
+    "$$\\delta_k^{(3)} = \\left(a_k^{(3)} - y_k \\right)$$\n",
+    "where $y_k \\in \\{0, 1\\}$ indicates whether the current training example belongs to class $k$ $(y_k = 1)$, or if it belongs to a different class $(y_k = 0)$. You may find logical arrays helpful for this task (explained in the previous programming exercise).\n",
+    "\n",
+    "1. For the hidden layer $l = 2$, set \n",
+    "$$ \\delta^{(2)} = \\left( \\Theta^{(2)} \\right)^T \\delta^{(3)} * g'\\left(z^{(2)} \\right)$$\n",
+    "Note that the symbol $*$ performs element wise multiplication in `numpy`.\n",
+    "\n",
+    "1. Accumulate the gradient from this example using the following formula. Note that you should skip or remove $\\delta_0^{(2)}$. In `numpy`, removing $\\delta_0^{(2)}$ corresponds to `delta_2 = delta_2[1:]`.\n",
+    "\n",
+    "1. Obtain the (unregularized) gradient for the neural network cost function by dividing the accumulated gradients by $\\frac{1}{m}$:\n",
+    "$$ \\frac{\\partial}{\\partial \\Theta_{ij}^{(l)}} J(\\Theta) = D_{ij}^{(l)} = \\frac{1}{m} \\Delta_{ij}^{(l)}$$\n",
+    "\n",
+    "<div class=\"alert alert-box alert-warning\">\n",
+    "**Python/Numpy tip**: You should implement the backpropagation algorithm only after you have successfully completed the feedforward and cost functions. While implementing the backpropagation alogrithm, it is often useful to use the `shape` function to print out the shapes of the variables you are working with if you run into dimension mismatch errors.\n",
+    "</div>\n",
+    "\n",
+    "[Click here to go back and update the function `nnCostFunction` with the backpropagation algorithm](#nnCostFunction)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After you have implemented the backpropagation algorithm, we will proceed to run gradient checking on your implementation. The gradient check will allow you to increase your confidence that your code is\n",
+    "computing the gradients correctly.\n",
+    "\n",
+    "### 2.4  Gradient checking \n",
+    "\n",
+    "In your neural network, you are minimizing the cost function $J(\\Theta)$. To perform gradient checking on your parameters, you can imagine “unrolling” the parameters $\\Theta^{(1)}$, $\\Theta^{(2)}$ into a long vector $\\theta$. By doing so, you can think of the cost function being $J(\\Theta)$ instead and use the following gradient checking procedure.\n",
+    "\n",
+    "Suppose you have a function $f_i(\\theta)$ that purportedly computes $\\frac{\\partial}{\\partial \\theta_i} J(\\theta)$; you’d like to check if $f_i$ is outputting correct derivative values.\n",
+    "\n",
+    "$$\n",
+    "\\text{Let } \\theta^{(i+)} = \\theta + \\begin{bmatrix} 0 \\\\ 0 \\\\ \\vdots \\\\ \\epsilon \\\\ \\vdots \\\\ 0 \\end{bmatrix}\n",
+    "\\quad \\text{and} \\quad \\theta^{(i-)} = \\theta - \\begin{bmatrix} 0 \\\\ 0 \\\\ \\vdots \\\\ \\epsilon \\\\ \\vdots \\\\ 0 \\end{bmatrix}\n",
+    "$$\n",
+    "\n",
+    "So, $\\theta^{(i+)}$ is the same as $\\theta$, except its $i^{th}$ element has been incremented by $\\epsilon$. Similarly, $\\theta^{(i−)}$ is the corresponding vector with the $i^{th}$ element decreased by $\\epsilon$. You can now numerically verify $f_i(\\theta)$’s correctness by checking, for each $i$, that:\n",
+    "\n",
+    "$$ f_i\\left( \\theta \\right) \\approx \\frac{J\\left( \\theta^{(i+)}\\right) - J\\left( \\theta^{(i-)} \\right)}{2\\epsilon} $$\n",
+    "\n",
+    "The degree to which these two values should approximate each other will depend on the details of $J$. But assuming $\\epsilon = 10^{-4}$, you’ll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).\n",
+    "\n",
+    "We have implemented the function to compute the numerical gradient for you in `computeNumericalGradient` (within the file `utils.py`). While you are not required to modify the file, we highly encourage you to take a look at the code to understand how it works.\n",
+    "\n",
+    "In the next cell we will run the provided function `checkNNGradients` which will create a small neural network and dataset that will be used for checking your gradients. If your backpropagation implementation is correct,\n",
+    "you should see a relative difference that is less than 1e-9.\n",
+    "\n",
+    "<div class=\"alert alert-box alert-success\">\n",
+    "**Practical Tip**: When performing gradient checking, it is much more efficient to use a small neural network with a relatively small number of input units and hidden units, thus having a relatively small number\n",
+    "of parameters. Each dimension of $\\theta$ requires two evaluations of the cost function and this can be expensive. In the function `checkNNGradients`, our code creates a small random model and dataset which is used with `computeNumericalGradient` for gradient checking. Furthermore, after you are confident that your gradient computations are correct, you should turn off gradient checking before running your learning algorithm.\n",
+    "</div>\n",
+    "\n",
+    "<div class=\"alert alert-box alert-success\">\n",
+    "**Practical Tip:** Gradient checking works for any function where you are computing the cost and the gradient. Concretely, you can use the same `computeNumericalGradient` function to check if your gradient implementations for the other exercises are correct too (e.g., logistic regression’s cost function).\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "utils.checkNNGradients(nnCostFunction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*Once your cost function passes the gradient check for the (unregularized) neural network cost function, you should submit the neural network gradient function (backpropagation).*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[4] = nnCostFunction\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section5\"></a>\n",
+    "### 2.5 Regularized Neural Network\n",
+    "\n",
+    "After you have successfully implemented the backpropagation algorithm, you will add regularization to the gradient. To account for regularization, it turns out that you can add this as an additional term *after* computing the gradients using backpropagation.\n",
+    "\n",
+    "Specifically, after you have computed $\\Delta_{ij}^{(l)}$ using backpropagation, you should add regularization using\n",
+    "\n",
+    "$$ \\begin{align} \n",
+    "& \\frac{\\partial}{\\partial \\Theta_{ij}^{(l)}} J(\\Theta) = D_{ij}^{(l)} = \\frac{1}{m} \\Delta_{ij}^{(l)} & \\qquad \\text{for } j = 0 \\\\\n",
+    "& \\frac{\\partial}{\\partial \\Theta_{ij}^{(l)}} J(\\Theta) = D_{ij}^{(l)} = \\frac{1}{m} \\Delta_{ij}^{(l)} + \\frac{\\lambda}{m} \\Theta_{ij}^{(l)} & \\qquad \\text{for } j \\ge 1\n",
+    "\\end{align}\n",
+    "$$\n",
+    "\n",
+    "Note that you should *not* be regularizing the first column of $\\Theta^{(l)}$ which is used for the bias term. Furthermore, in the parameters $\\Theta_{ij}^{(l)}$, $i$ is indexed starting from 1, and $j$ is indexed starting from 0. Thus, \n",
+    "\n",
+    "$$\n",
+    "\\Theta^{(l)} = \\begin{bmatrix}\n",
+    "\\Theta_{1,0}^{(i)} & \\Theta_{1,1}^{(l)} & \\cdots \\\\\n",
+    "\\Theta_{2,0}^{(i)} & \\Theta_{2,1}^{(l)} & \\cdots \\\\\n",
+    "\\vdots &  ~ & \\ddots\n",
+    "\\end{bmatrix}\n",
+    "$$\n",
+    "\n",
+    "[Now modify your code that computes grad in `nnCostFunction` to account for regularization.](#nnCostFunction)\n",
+    "\n",
+    "After you are done, the following cell runs gradient checking on your implementation. If your code is correct, you should expect to see a relative difference that is less than 1e-9."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#  Check gradients by running checkNNGradients\n",
+    "lambda_ = 3\n",
+    "utils.checkNNGradients(nnCostFunction, lambda_)\n",
+    "\n",
+    "# Also output the costFunction debugging values\n",
+    "debug_J, _  = nnCostFunction(nn_params, input_layer_size,\n",
+    "                          hidden_layer_size, num_labels, X, y, lambda_)\n",
+    "\n",
+    "print('\\n\\nCost at (fixed) debugging parameters (w/ lambda = %f): %f ' % (lambda_, debug_J))\n",
+    "print('(for lambda = 3, this value should be about 0.576051)')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[5] = nnCostFunction\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.6 Learning parameters using `scipy.optimize.minimize`\n",
+    "\n",
+    "After you have successfully implemented the neural network cost function\n",
+    "and gradient computation, the next step we will use `scipy`'s minimization to learn a good set parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#  After you have completed the assignment, change the maxiter to a larger\n",
+    "#  value to see how more training helps.\n",
+    "options= {'maxiter': 100}\n",
+    "\n",
+    "#  You should also try different values of lambda\n",
+    "lambda_ = 1\n",
+    "\n",
+    "# Create \"short hand\" for the cost function to be minimized\n",
+    "costFunction = lambda p: nnCostFunction(p, input_layer_size,\n",
+    "                                        hidden_layer_size,\n",
+    "                                        num_labels, X, y, lambda_)\n",
+    "\n",
+    "# Now, costFunction is a function that takes in only one argument\n",
+    "# (the neural network parameters)\n",
+    "res = optimize.minimize(costFunction,\n",
+    "                        initial_nn_params,\n",
+    "                        jac=True,\n",
+    "                        method='TNC',\n",
+    "                        options=options)\n",
+    "\n",
+    "# get the solution of the optimization\n",
+    "nn_params = res.x\n",
+    "        \n",
+    "# Obtain Theta1 and Theta2 back from nn_params\n",
+    "Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],\n",
+    "                    (hidden_layer_size, (input_layer_size + 1)))\n",
+    "\n",
+    "Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],\n",
+    "                    (num_labels, (hidden_layer_size + 1)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After the training completes, we will proceed to report the training accuracy of your classifier by computing the percentage of examples it got correct. If your implementation is correct, you should see a reported\n",
+    "training accuracy of about 95.3% (this may vary by about 1% due to the random initialization). It is possible to get higher training accuracies by training the neural network for more iterations. We encourage you to try\n",
+    "training the neural network for more iterations (e.g., set `maxiter` to 400) and also vary the regularization parameter $\\lambda$. With the right learning settings, it is possible to get the neural network to perfectly fit the training set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pred = utils.predict(Theta1, Theta2, X)\n",
+    "print('Training Set Accuracy: %f' % (np.mean(pred == y) * 100))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3 Visualizing the Hidden Layer\n",
+    "\n",
+    "One way to understand what your neural network is learning is to visualize what the representations captured by the hidden units. Informally, given a particular hidden unit, one way to visualize what it computes is to find an input $x$ that will cause it to activate (that is, to have an activation value \n",
+    "($a_i^{(l)}$) close to 1). For the neural network you trained, notice that the $i^{th}$ row of $\\Theta^{(1)}$ is a 401-dimensional vector that represents the parameter for the $i^{th}$ hidden unit. If we discard the bias term, we get a 400 dimensional vector that represents the weights from each input pixel to the hidden unit.\n",
+    "\n",
+    "Thus, one way to visualize the “representation” captured by the hidden unit is to reshape this 400 dimensional vector into a 20 × 20 image and display it (It turns out that this is equivalent to finding the input that gives the highest activation for the hidden unit, given a “norm” constraint on the input (i.e., $||x||_2 \\le 1$)). \n",
+    "\n",
+    "The next cell does this by using the `displayData` function and it will show you an image with 25 units,\n",
+    "each corresponding to one hidden unit in the network. In your trained network, you should find that the hidden units corresponds roughly to detectors that look for strokes and other patterns in the input."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "utils.displayData(Theta1[:, 1:])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.1 Optional (ungraded) exercise\n",
+    "\n",
+    "In this part of the exercise, you will get to try out different learning settings for the neural network to see how the performance of the neural network varies with the regularization parameter $\\lambda$ and number of training steps (the `maxiter` option when using `scipy.optimize.minimize`). Neural networks are very powerful models that can form highly complex decision boundaries. Without regularization, it is possible for a neural network to “overfit” a training set so that it obtains close to 100% accuracy on the training set but does not as well on new examples that it has not seen before. You can set the regularization $\\lambda$ to a smaller value and the `maxiter` parameter to a higher number of iterations to see this for youself."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/utils.py b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/utils.py
new file mode 100644
index 000000000..6b7c3bdc1
--- /dev/null
+++ b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise4/utils.py	
@@ -0,0 +1,226 @@
+import sys
+import numpy as np
+from matplotlib import pyplot
+
+sys.path.append('..')
+from submission import SubmissionBase
+
+
+def displayData(X, example_width=None, figsize=(10, 10)):
+    """
+    Displays 2D data stored in X in a nice grid.
+    """
+    # Compute rows, cols
+    if X.ndim == 2:
+        m, n = X.shape
+    elif X.ndim == 1:
+        n = X.size
+        m = 1
+        X = X[None]  # Promote to a 2 dimensional array
+    else:
+        raise IndexError('Input X should be 1 or 2 dimensional.')
+
+    example_width = example_width or int(np.round(np.sqrt(n)))
+    example_height = n / example_width
+
+    # Compute number of items to display
+    display_rows = int(np.floor(np.sqrt(m)))
+    display_cols = int(np.ceil(m / display_rows))
+
+    fig, ax_array = pyplot.subplots(display_rows, display_cols, figsize=figsize)
+    fig.subplots_adjust(wspace=0.025, hspace=0.025)
+
+    ax_array = [ax_array] if m == 1 else ax_array.ravel()
+
+    for i, ax in enumerate(ax_array):
+        # Display Image
+        h = ax.imshow(X[i].reshape(example_width, example_width, order='F'),
+                      cmap='Greys', extent=[0, 1, 0, 1])
+        ax.axis('off')
+
+
+def predict(Theta1, Theta2, X):
+    """
+    Predict the label of an input given a trained neural network
+    Outputs the predicted label of X given the trained weights of a neural
+    network(Theta1, Theta2)
+    """
+    # Useful values
+    m = X.shape[0]
+    num_labels = Theta2.shape[0]
+
+    # You need to return the following variables correctly
+    p = np.zeros(m)
+    h1 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), X], axis=1), Theta1.T))
+    h2 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), h1], axis=1), Theta2.T))
+    p = np.argmax(h2, axis=1)
+    return p
+
+
+def debugInitializeWeights(fan_out, fan_in):
+    """
+    Initialize the weights of a layer with fan_in incoming connections and fan_out outgoings
+    connections using a fixed strategy. This will help you later in debugging.
+
+    Note that W should be set a matrix of size (1+fan_in, fan_out) as the first row of W handles
+    the "bias" terms.
+
+    Parameters
+    ----------
+    fan_out : int
+        The number of outgoing connections.
+
+    fan_in : int
+        The number of incoming connections.
+
+    Returns
+    -------
+    W : array_like (1+fan_in, fan_out)
+        The initialized weights array given the dimensions.
+    """
+    # Initialize W using "sin". This ensures that W is always of the same values and will be
+    # useful for debugging
+    W = np.sin(np.arange(1, 1 + (1+fan_in)*fan_out))/10.0
+    W = W.reshape(fan_out, 1+fan_in, order='F')
+    return W
+
+
+def computeNumericalGradient(J, theta, e=1e-4):
+    """
+    Computes the gradient using "finite differences" and gives us a numerical estimate of the
+    gradient.
+
+    Parameters
+    ----------
+    J : func
+        The cost function which will be used to estimate its numerical gradient.
+
+    theta : array_like
+        The one dimensional unrolled network parameters. The numerical gradient is computed at
+         those given parameters.
+
+    e : float (optional)
+        The value to use for epsilon for computing the finite difference.
+
+    Notes
+    -----
+    The following code implements numerical gradient checking, and
+    returns the numerical gradient. It sets `numgrad[i]` to (a numerical
+    approximation of) the partial derivative of J with respect to the
+    i-th input argument, evaluated at theta. (i.e., `numgrad[i]` should
+    be the (approximately) the partial derivative of J with respect
+    to theta[i].)
+    """
+    numgrad = np.zeros(theta.shape)
+    perturb = np.diag(e * np.ones(theta.shape))
+    for i in range(theta.size):
+        loss1, _ = J(theta - perturb[:, i])
+        loss2, _ = J(theta + perturb[:, i])
+        numgrad[i] = (loss2 - loss1)/(2*e)
+    return numgrad
+
+
+def checkNNGradients(nnCostFunction, lambda_=0):
+    """
+    Creates a small neural network to check the backpropagation gradients. It will output the
+    analytical gradients produced by your backprop code and the numerical gradients
+    (computed using computeNumericalGradient). These two gradient computations should result in
+    very similar values.
+
+    Parameters
+    ----------
+    nnCostFunction : func
+        A reference to the cost function implemented by the student.
+
+    lambda_ : float (optional)
+        The regularization parameter value.
+    """
+    input_layer_size = 3
+    hidden_layer_size = 5
+    num_labels = 3
+    m = 5
+
+    # We generate some 'random' test data
+    Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size)
+    Theta2 = debugInitializeWeights(num_labels, hidden_layer_size)
+
+    # Reusing debugInitializeWeights to generate X
+    X = debugInitializeWeights(m, input_layer_size - 1)
+    y = np.arange(1, 1+m) % num_labels
+    # print(y)
+    # Unroll parameters
+    nn_params = np.concatenate([Theta1.ravel(), Theta2.ravel()])
+
+    # short hand for cost function
+    costFunc = lambda p: nnCostFunction(p, input_layer_size, hidden_layer_size,
+                                        num_labels, X, y, lambda_)
+    cost, grad = costFunc(nn_params)
+    numgrad = computeNumericalGradient(costFunc, nn_params)
+
+    # Visually examine the two gradient computations.The two columns you get should be very similar.
+    print(np.stack([numgrad, grad], axis=1))
+    print('The above two columns you get should be very similar.')
+    print('(Left-Your Numerical Gradient, Right-Analytical Gradient)\n')
+
+    # Evaluate the norm of the difference between two the solutions. If you have a correct
+    # implementation, and assuming you used e = 0.0001 in computeNumericalGradient, then diff
+    # should be less than 1e-9.
+    diff = np.linalg.norm(numgrad - grad)/np.linalg.norm(numgrad + grad)
+
+    print('If your backpropagation implementation is correct, then \n'
+          'the relative difference will be small (less than 1e-9). \n'
+          'Relative Difference: %g' % diff)
+
+
+def sigmoid(z):
+    """
+    Computes the sigmoid of z.
+    """
+    return 1.0 / (1.0 + np.exp(-z))
+
+
+class Grader(SubmissionBase):
+    X = np.reshape(3 * np.sin(np.arange(1, 31)), (3, 10), order='F')
+    Xm = np.reshape(np.sin(np.arange(1, 33)), (16, 2), order='F') / 5
+    ym = np.arange(1, 17) % 4
+    t1 = np.sin(np.reshape(np.arange(1, 25, 2), (4, 3), order='F'))
+    t2 = np.cos(np.reshape(np.arange(1, 41, 2), (4, 5), order='F'))
+    t = np.concatenate([t1.ravel(), t2.ravel()], axis=0)
+
+    def __init__(self):
+        part_names = ['Feedforward and Cost Function',
+                      'Regularized Cost Function',
+                      'Sigmoid Gradient',
+                      'Neural Network Gradient (Backpropagation)',
+                      'Regularized Gradient']
+        super().__init__('neural-network-learning', part_names)
+
+    def __iter__(self):
+        for part_id in range(1, 6):
+            try:
+                func = self.functions[part_id]
+
+                # Each part has different expected arguments/different function
+                if part_id == 1:
+                    res = func(self.t, 2, 4, 4, self.Xm, self.ym, 0)[0]
+                elif part_id == 2:
+                    res = func(self.t, 2, 4, 4, self.Xm, self.ym, 1.5)
+                elif part_id == 3:
+                    res = func(self.X, )
+                elif part_id == 4:
+                    J, grad = func(self.t, 2, 4, 4, self.Xm, self.ym, 0)
+                    grad1 = np.reshape(grad[:12], (4, 3))
+                    grad2 = np.reshape(grad[12:], (4, 5))
+                    grad = np.concatenate([grad1.ravel('F'), grad2.ravel('F')])
+                    res = np.hstack([J, grad]).tolist()
+                elif part_id == 5:
+                    J, grad = func(self.t, 2, 4, 4, self.Xm, self.ym, 1.5)
+                    grad1 = np.reshape(grad[:12], (4, 3))
+                    grad2 = np.reshape(grad[12:], (4, 5))
+                    grad = np.concatenate([grad1.ravel('F'), grad2.ravel('F')])
+                    res = np.hstack([J, grad]).tolist()
+                else:
+                    raise KeyError
+                yield part_id, res
+            except KeyError:
+                yield part_id, 0
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex4data1.mat b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex4data1.mat
new file mode 100644
index 000000000..371bd0c09
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex4data1.mat differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex4weights.mat b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex4weights.mat
new file mode 100644
index 000000000..ace2a090d
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex4weights.mat differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex5data1.mat b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex5data1.mat
new file mode 100644
index 000000000..5a17abdbb
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Data/ex5data1.mat differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/cross_validation.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/cross_validation.png
new file mode 100644
index 000000000..e6a8f28ff
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/cross_validation.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/learning_curve.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/learning_curve.png
new file mode 100644
index 000000000..c4d3e1fb7
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/learning_curve.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/learning_curve_random.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/learning_curve_random.png
new file mode 100644
index 000000000..ee965256e
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/learning_curve_random.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/linear_fit.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/linear_fit.png
new file mode 100644
index 000000000..826912f6d
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/linear_fit.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_learning_curve.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_learning_curve.png
new file mode 100644
index 000000000..39e4af465
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_learning_curve.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_learning_curve_reg_1.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_learning_curve_reg_1.png
new file mode 100644
index 000000000..01b52b042
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_learning_curve_reg_1.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression.png
new file mode 100644
index 000000000..530ae53eb
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression_reg_1.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression_reg_1.png
new file mode 100644
index 000000000..e27bb13d8
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression_reg_1.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression_reg_100.png b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression_reg_100.png
new file mode 100644
index 000000000..cb060bc96
Binary files /dev/null and b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/Figures/polynomial_regression_reg_100.png differ
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/exercise5.ipynb b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/exercise5.ipynb
new file mode 100644
index 000000000..66c4500ee
--- /dev/null
+++ b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/exercise5.ipynb	
@@ -0,0 +1,927 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Programming Exercise 5:\n",
+    "# Regularized Linear Regression and Bias vs Variance\n",
+    "\n",
+    "## Introduction\n",
+    "\n",
+    "In this exercise, you will implement regularized linear regression and use it to study models with different bias-variance properties. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.\n",
+    "\n",
+    "All the information you need for solving this assignment is in this notebook, and all the code you will be implementing will take place within this notebook. The assignment can be promptly submitted to the coursera grader directly from this notebook (code and instructions are included below).\n",
+    "\n",
+    "Before we begin with the exercises, we need to import all libraries required for this programming exercise. Throughout the course, we will be using [`numpy`](http://www.numpy.org/) for all arrays and matrix operations, [`matplotlib`](https://matplotlib.org/) for plotting, and [`scipy`](https://docs.scipy.org/doc/scipy/reference/) for scientific and numerical computation functions and tools. You can find instructions on how to install required libraries in the README file in the [github repository](https://github.com/dibgerge/ml-coursera-python-assignments)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# used for manipulating directory paths\n",
+    "import os\n",
+    "\n",
+    "# Scientific and vector computation for python\n",
+    "import numpy as np\n",
+    "\n",
+    "# Plotting library\n",
+    "from matplotlib import pyplot\n",
+    "\n",
+    "# Optimization module in scipy\n",
+    "from scipy import optimize\n",
+    "\n",
+    "# will be used to load MATLAB mat datafile format\n",
+    "from scipy.io import loadmat\n",
+    "\n",
+    "# library written for this exercise providing additional functions for assignment submission, and others\n",
+    "import utils\n",
+    "\n",
+    "# define the submission/grader object for this exercise\n",
+    "grader = utils.Grader()\n",
+    "\n",
+    "# tells matplotlib to embed plots within the notebook\n",
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Submission and Grading\n",
+    "\n",
+    "\n",
+    "After completing each part of the assignment, be sure to submit your solutions to the grader. The following is a breakdown of how each part of this exercise is scored.\n",
+    "\n",
+    "\n",
+    "| Section | Part                                             | Submitted Function                | Points |\n",
+    "| :-      |:-                                                |:-                                 | :-:    |\n",
+    "| 1       | [Regularized Linear Regression Cost Function](#section1)      | [`linearRegCostFunction`](#linearRegCostFunction) |  25    |\n",
+    "| 2       | [Regularized Linear Regression Gradient](#section2)           | [`linearRegCostFunction`](#linearRegCostFunction) |25      |\n",
+    "| 3       | [Learning Curve](#section3)                                   | [`learningCurve`](#func2)         | 20     |\n",
+    "| 4       | [Polynomial Feature Mapping](#section4)                       | [`polyFeatures`](#polyFeatures)          | 10     |\n",
+    "| 5       | [Cross Validation Curve](#section5)                           | [`validationCurve`](#validationCurve)       | 20     |\n",
+    "|         | Total Points                                     |                                   |100     |\n",
+    "\n",
+    "\n",
+    "You are allowed to submit your solutions multiple times, and we will take only the highest score into consideration.\n",
+    "\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
+    "At the end of each section in this notebook, we have a cell which contains code for submitting the solutions thus far to the grader. Execute the cell to see your score up to the current section. For all your work to be submitted properly, you must execute those cells at least once.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section1\"></a>\n",
+    "## 1 Regularized Linear Regression\n",
+    "\n",
+    "In the first half of the exercise, you will implement regularized linear regression to predict the amount of water flowing out of a dam using the change of water level in a reservoir. In the next half, you will go through some diagnostics of debugging learning algorithms and examine the effects of bias v.s.\n",
+    "variance. \n",
+    "\n",
+    "### 1.1 Visualizing the dataset\n",
+    "\n",
+    "We will begin by visualizing the dataset containing historical records on the change in the water level, $x$, and the amount of water flowing out of the dam, $y$. This dataset is divided into three parts:\n",
+    "\n",
+    "- A **training** set that your model will learn on: `X`, `y`\n",
+    "- A **cross validation** set for determining the regularization parameter: `Xval`, `yval`\n",
+    "- A **test** set for evaluating performance. These are “unseen” examples which your model did not see during training: `Xtest`, `ytest`\n",
+    "\n",
+    "Run the next cell to plot the training data. In the following parts, you will implement linear regression and use that to fit a straight line to the data and plot learning curves. Following that, you will implement polynomial regression to find a better fit to the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load from ex5data1.mat, where all variables will be store in a dictionary\n",
+    "data = loadmat(os.path.join('Data', 'ex5data1.mat'))\n",
+    "\n",
+    "# Extract train, test, validation data from dictionary\n",
+    "# and also convert y's form 2-D matrix (MATLAB format) to a numpy vector\n",
+    "X, y = data['X'], data['y'][:, 0]\n",
+    "Xtest, ytest = data['Xtest'], data['ytest'][:, 0]\n",
+    "Xval, yval = data['Xval'], data['yval'][:, 0]\n",
+    "\n",
+    "# m = Number of examples\n",
+    "m = y.size\n",
+    "\n",
+    "# Plot training data\n",
+    "pyplot.plot(X, y, 'ro', ms=10, mec='k', mew=1)\n",
+    "pyplot.xlabel('Change in water level (x)')\n",
+    "pyplot.ylabel('Water flowing out of the dam (y)');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.2 Regularized linear regression cost function\n",
+    "\n",
+    "Recall that regularized linear regression has the following cost function:\n",
+    "\n",
+    "$$ J(\\theta) = \\frac{1}{2m} \\left( \\sum_{i=1}^m \\left( h_\\theta\\left( x^{(i)} \\right) - y^{(i)} \\right)^2 \\right) + \\frac{\\lambda}{2m} \\left( \\sum_{j=1}^n \\theta_j^2 \\right)$$\n",
+    "\n",
+    "where $\\lambda$ is a regularization parameter which controls the degree of regularization (thus, help preventing overfitting). The regularization term puts a penalty on the overall cost J. As the magnitudes of the model parameters $\\theta_j$ increase, the penalty increases as well. Note that you should not regularize\n",
+    "the $\\theta_0$ term.\n",
+    "\n",
+    "You should now complete the code in the function `linearRegCostFunction` in the next cell. Your task is to calculate the regularized linear regression cost function. If possible, try to vectorize your code and avoid writing loops.\n",
+    "<a id=\"linearRegCostFunction\"></a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def linearRegCostFunction(X, y, theta, lambda_=0.0):\n",
+    "    \"\"\"\n",
+    "    Compute cost and gradient for regularized linear regression \n",
+    "    with multiple variables. Computes the cost of using theta as\n",
+    "    the parameter for linear regression to fit the data points in X and y. \n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    X : array_like\n",
+    "        The dataset. Matrix with shape (m x n + 1) where m is the \n",
+    "        total number of examples, and n is the number of features \n",
+    "        before adding the bias term.\n",
+    "    \n",
+    "    y : array_like\n",
+    "        The functions values at each datapoint. A vector of\n",
+    "        shape (m, ).\n",
+    "    \n",
+    "    theta : array_like\n",
+    "        The parameters for linear regression. A vector of shape (n+1,).\n",
+    "    \n",
+    "    lambda_ : float, optional\n",
+    "        The regularization parameter.\n",
+    "    \n",
+    "    Returns\n",
+    "    -------\n",
+    "    J : float\n",
+    "        The computed cost function. \n",
+    "    \n",
+    "    grad : array_like\n",
+    "        The value of the cost function gradient w.r.t theta. \n",
+    "        A vector of shape (n+1, ).\n",
+    "    \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    Compute the cost and gradient of regularized linear regression for\n",
+    "    a particular choice of theta.\n",
+    "    You should set J to the cost and grad to the gradient.\n",
+    "    \"\"\"\n",
+    "    # Initialize some useful values\n",
+    "    m = y.size # number of training examples\n",
+    "\n",
+    "    # You need to return the following variables correctly \n",
+    "    J = 0\n",
+    "    grad = np.zeros(theta.shape)\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # ============================================================\n",
+    "    return J, grad"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When you are finished, the next cell will run your cost function using `theta` initialized at `[1, 1]`. You should expect to see an output of 303.993."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "theta = np.array([1, 1])\n",
+    "J, _ = linearRegCostFunction(np.concatenate([np.ones((m, 1)), X], axis=1), y, theta, 1)\n",
+    "\n",
+    "print('Cost at theta = [1, 1]:\\t   %f ' % J)\n",
+    "print('This value should be about 303.993192)\\n' % J)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After completing a part of the exercise, you can submit your solutions for grading by first adding the function you modified to the submission object, and then sending your function to Coursera for grading. \n",
+    "\n",
+    "The submission script will prompt you for your login e-mail and submission token. You can obtain a submission token from the web page for the assignment. You are allowed to submit your solutions multiple times, and we will take only the highest score into consideration.\n",
+    "\n",
+    "*Execute the following cell to grade your solution to the first part of this exercise.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[1] = linearRegCostFunction\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section2\"></a>\n",
+    "### 1.3 Regularized linear regression gradient\n",
+    "\n",
+    "Correspondingly, the partial derivative of the cost function for regularized linear regression is defined as:\n",
+    "\n",
+    "$$\n",
+    "\\begin{align}\n",
+    "& \\frac{\\partial J(\\theta)}{\\partial \\theta_0} = \\frac{1}{m} \\sum_{i=1}^m \\left( h_\\theta \\left(x^{(i)} \\right) - y^{(i)} \\right) x_j^{(i)} & \\qquad \\text{for } j = 0 \\\\\n",
+    "& \\frac{\\partial J(\\theta)}{\\partial \\theta_j} = \\left( \\frac{1}{m} \\sum_{i=1}^m \\left( h_\\theta \\left( x^{(i)} \\right) - y^{(i)} \\right) x_j^{(i)} \\right) + \\frac{\\lambda}{m} \\theta_j & \\qquad \\text{for } j \\ge 1\n",
+    "\\end{align}\n",
+    "$$\n",
+    "\n",
+    "In the function [`linearRegCostFunction`](#linearRegCostFunction) above, add code to calculate the gradient, returning it in the variable `grad`. <font color='red'><b>Do not forget to re-execute the cell containing this function to update the function's definition.</b></font>\n",
+    "\n",
+    "\n",
+    "When you are finished, use the next cell to  run your gradient function using theta initialized at `[1, 1]`. You should expect to see a gradient of `[-15.30, 598.250]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "theta = np.array([1, 1])\n",
+    "J, grad = linearRegCostFunction(np.concatenate([np.ones((m, 1)), X], axis=1), y, theta, 1)\n",
+    "\n",
+    "print('Gradient at theta = [1, 1]:  [{:.6f}, {:.6f}] '.format(*grad))\n",
+    "print(' (this value should be about [-15.303016, 598.250744])\\n')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[2] = linearRegCostFunction\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Fitting linear regression\n",
+    "\n",
+    "Once your cost function and gradient are working correctly, the next cell will run the code in `trainLinearReg` (found in the module `utils.py`) to compute the optimal values of $\\theta$. This training function uses `scipy`'s optimization module to minimize the cost function.\n",
+    "\n",
+    "In this part, we set regularization parameter $\\lambda$ to zero. Because our current implementation of linear regression is trying to fit a 2-dimensional $\\theta$, regularization will not be incredibly helpful for a $\\theta$ of such low dimension. In the later parts of the exercise, you will be using polynomial regression with regularization.\n",
+    "\n",
+    "Finally, the code in the next cell should also plot the best fit line, which should look like the figure below. \n",
+    "\n",
+    "![](Figures/linear_fit.png)\n",
+    "\n",
+    "The best fit line tells us that the model is not a good fit to the data because the data has a non-linear pattern. While visualizing the best fit as shown is one possible way to debug your learning algorithm, it is not always easy to visualize the data and model. In the next section, you will implement a function to generate learning curves that can help you debug your learning algorithm even if it is not easy to visualize the\n",
+    "data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# add a columns of ones for the y-intercept\n",
+    "X_aug = np.concatenate([np.ones((m, 1)), X], axis=1)\n",
+    "theta = utils.trainLinearReg(linearRegCostFunction, X_aug, y, lambda_=0)\n",
+    "\n",
+    "#  Plot fit over the data\n",
+    "pyplot.plot(X, y, 'ro', ms=10, mec='k', mew=1.5)\n",
+    "pyplot.xlabel('Change in water level (x)')\n",
+    "pyplot.ylabel('Water flowing out of the dam (y)')\n",
+    "pyplot.plot(X, np.dot(X_aug, theta), '--', lw=2);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section3\"></a>\n",
+    "## 2 Bias-variance\n",
+    "\n",
+    "An important concept in machine learning is the bias-variance tradeoff. Models with high bias are not complex enough for the data and tend to underfit, while models with high variance overfit to the training data.\n",
+    "\n",
+    "In this part of the exercise, you will plot training and test errors on a learning curve to diagnose bias-variance problems.\n",
+    "\n",
+    "### 2.1 Learning Curves\n",
+    "\n",
+    "You will now implement code to generate the learning curves that will be useful in debugging learning algorithms. Recall that a learning curve plots training and cross validation error as a function of training set size. Your job is to fill in the function `learningCurve` in the next cell, so that it returns a vector of errors for the training set and cross validation set.\n",
+    "\n",
+    "To plot the learning curve, we need a training and cross validation set error for different training set sizes. To obtain different training set sizes, you should use different subsets of the original training set `X`. Specifically, for a training set size of $i$, you should use the first $i$ examples (i.e., `X[:i, :]`\n",
+    "and `y[:i]`).\n",
+    "\n",
+    "You can use the `trainLinearReg` function (by calling `utils.trainLinearReg(...)`) to find the $\\theta$ parameters. Note that the `lambda_` is passed as a parameter to the `learningCurve` function.\n",
+    "After learning the $\\theta$ parameters, you should compute the error on the training and cross validation sets. Recall that the training error for a dataset is defined as\n",
+    "\n",
+    "$$ J_{\\text{train}} = \\frac{1}{2m} \\left[ \\sum_{i=1}^m \\left(h_\\theta \\left( x^{(i)} \\right) - y^{(i)} \\right)^2 \\right] $$\n",
+    "\n",
+    "In particular, note that the training error does not include the regularization term. One way to compute the training error is to use your existing cost function and set $\\lambda$ to 0 only when using it to compute the training error and cross validation error. When you are computing the training set error, make sure you compute it on the training subset (i.e., `X[:n,:]` and `y[:n]`) instead of the entire training set. However, for the cross validation error, you should compute it over the entire cross validation set. You should store\n",
+    "the computed errors in the vectors error train and error val.\n",
+    "\n",
+    "<a id=\"func2\"></a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def learningCurve(X, y, Xval, yval, lambda_=0):\n",
+    "    \"\"\"\n",
+    "    Generates the train and cross validation set errors needed to plot a learning curve\n",
+    "    returns the train and cross validation set errors for a learning curve. \n",
+    "    \n",
+    "    In this function, you will compute the train and test errors for\n",
+    "    dataset sizes from 1 up to m. In practice, when working with larger\n",
+    "    datasets, you might want to do this in larger intervals.\n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    X : array_like\n",
+    "        The training dataset. Matrix with shape (m x n + 1) where m is the \n",
+    "        total number of examples, and n is the number of features \n",
+    "        before adding the bias term.\n",
+    "    \n",
+    "    y : array_like\n",
+    "        The functions values at each training datapoint. A vector of\n",
+    "        shape (m, ).\n",
+    "    \n",
+    "    Xval : array_like\n",
+    "        The validation dataset. Matrix with shape (m_val x n + 1) where m is the \n",
+    "        total number of examples, and n is the number of features \n",
+    "        before adding the bias term.\n",
+    "    \n",
+    "    yval : array_like\n",
+    "        The functions values at each validation datapoint. A vector of\n",
+    "        shape (m_val, ).\n",
+    "    \n",
+    "    lambda_ : float, optional\n",
+    "        The regularization parameter.\n",
+    "    \n",
+    "    Returns\n",
+    "    -------\n",
+    "    error_train : array_like\n",
+    "        A vector of shape m. error_train[i] contains the training error for\n",
+    "        i examples.\n",
+    "    error_val : array_like\n",
+    "        A vecotr of shape m. error_val[i] contains the validation error for\n",
+    "        i training examples.\n",
+    "    \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    Fill in this function to return training errors in error_train and the\n",
+    "    cross validation errors in error_val. i.e., error_train[i] and \n",
+    "    error_val[i] should give you the errors obtained after training on i examples.\n",
+    "    \n",
+    "    Notes\n",
+    "    -----\n",
+    "    - You should evaluate the training error on the first i training\n",
+    "      examples (i.e., X[:i, :] and y[:i]).\n",
+    "    \n",
+    "      For the cross-validation error, you should instead evaluate on\n",
+    "      the _entire_ cross validation set (Xval and yval).\n",
+    "    \n",
+    "    - If you are using your cost function (linearRegCostFunction) to compute\n",
+    "      the training and cross validation error, you should call the function with\n",
+    "      the lambda argument set to 0. Do note that you will still need to use\n",
+    "      lambda when running the training to obtain the theta parameters.\n",
+    "    \n",
+    "    Hint\n",
+    "    ----\n",
+    "    You can loop over the examples with the following:\n",
+    "     \n",
+    "           for i in range(1, m+1):\n",
+    "               # Compute train/cross validation errors using training examples \n",
+    "               # X[:i, :] and y[:i], storing the result in \n",
+    "               # error_train[i-1] and error_val[i-1]\n",
+    "               ....  \n",
+    "    \"\"\"\n",
+    "    # Number of training examples\n",
+    "    m = y.size\n",
+    "\n",
+    "    # You need to return these values correctly\n",
+    "    error_train = np.zeros(m)\n",
+    "    error_val   = np.zeros(m)\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "         \n",
+    "\n",
+    "        \n",
+    "    # =============================================================\n",
+    "    return error_train, error_val"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When you are finished implementing the function `learningCurve`, executing the next cell prints the learning curves and produce a plot similar to the figure below. \n",
+    "\n",
+    "![](Figures/learning_curve.png)\n",
+    "\n",
+    "In the learning curve figure, you can observe that both the train error and cross validation error are high when the number of training examples is increased. This reflects a high bias problem in the model - the linear regression model is too simple and is unable to fit our dataset well. In the next section, you will implement polynomial regression to fit a better model for this dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_aug = np.concatenate([np.ones((m, 1)), X], axis=1)\n",
+    "Xval_aug = np.concatenate([np.ones((yval.size, 1)), Xval], axis=1)\n",
+    "error_train, error_val = learningCurve(X_aug, y, Xval_aug, yval, lambda_=0)\n",
+    "\n",
+    "pyplot.plot(np.arange(1, m+1), error_train, np.arange(1, m+1), error_val, lw=2)\n",
+    "pyplot.title('Learning curve for linear regression')\n",
+    "pyplot.legend(['Train', 'Cross Validation'])\n",
+    "pyplot.xlabel('Number of training examples')\n",
+    "pyplot.ylabel('Error')\n",
+    "pyplot.axis([0, 13, 0, 150])\n",
+    "\n",
+    "print('# Training Examples\\tTrain Error\\tCross Validation Error')\n",
+    "for i in range(m):\n",
+    "    print('  \\t%d\\t\\t%f\\t%f' % (i+1, error_train[i], error_val[i]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[3] = learningCurve\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section4\"></a>\n",
+    "\n",
+    "## 3 Polynomial regression\n",
+    "\n",
+    "The problem with our linear model was that it was too simple for the data\n",
+    "and resulted in underfitting (high bias). In this part of the exercise, you will address this problem by adding more features. For polynomial regression, our hypothesis has the form:\n",
+    "\n",
+    "$$\n",
+    "\\begin{align}\n",
+    "h_\\theta(x)  &= \\theta_0 + \\theta_1 \\times (\\text{waterLevel}) + \\theta_2 \\times (\\text{waterLevel})^2 + \\cdots + \\theta_p \\times (\\text{waterLevel})^p \\\\\n",
+    "& = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\cdots + \\theta_p x_p\n",
+    "\\end{align}\n",
+    "$$\n",
+    "\n",
+    "Notice that by defining $x_1 = (\\text{waterLevel})$, $x_2 = (\\text{waterLevel})^2$ , $\\cdots$, $x_p =\n",
+    "(\\text{waterLevel})^p$, we obtain a linear regression model where the features are the various powers of the original value (waterLevel).\n",
+    "\n",
+    "Now, you will add more features using the higher powers of the existing feature $x$ in the dataset. Your task in this part is to complete the code in the function `polyFeatures` in the next cell. The function should map the original training set $X$ of size $m \\times 1$ into its higher powers. Specifically, when a training set $X$ of size $m \\times 1$ is passed into the function, the function should return a $m \\times p$ matrix `X_poly`, where column 1 holds the original values of X, column 2 holds the values of $X^2$, column 3 holds the values of $X^3$, and so on. Note that you don’t have to account for the zero-eth power in this function.\n",
+    "\n",
+    "<a id=\"polyFeatures\"></a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def polyFeatures(X, p):\n",
+    "    \"\"\"\n",
+    "    Maps X (1D vector) into the p-th power.\n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    X : array_like\n",
+    "        A data vector of size m, where m is the number of examples.\n",
+    "    \n",
+    "    p : int\n",
+    "        The polynomial power to map the features. \n",
+    "    \n",
+    "    Returns \n",
+    "    -------\n",
+    "    X_poly : array_like\n",
+    "        A matrix of shape (m x p) where p is the polynomial \n",
+    "        power and m is the number of examples. That is:\n",
+    "    \n",
+    "        X_poly[i, :] = [X[i], X[i]**2, X[i]**3 ...  X[i]**p]\n",
+    "    \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    Given a vector X, return a matrix X_poly where the p-th column of\n",
+    "    X contains the values of X to the p-th power.\n",
+    "    \"\"\"\n",
+    "    # You need to return the following variables correctly.\n",
+    "    X_poly = np.zeros((X.shape[0], p))\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # ============================================================\n",
+    "    return X_poly"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now you have a function that will map features to a higher dimension. The next cell will apply it to the training set, the test set, and the cross validation set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "p = 8\n",
+    "\n",
+    "# Map X onto Polynomial Features and Normalize\n",
+    "X_poly = polyFeatures(X, p)\n",
+    "X_poly, mu, sigma = utils.featureNormalize(X_poly)\n",
+    "X_poly = np.concatenate([np.ones((m, 1)), X_poly], axis=1)\n",
+    "\n",
+    "# Map X_poly_test and normalize (using mu and sigma)\n",
+    "X_poly_test = polyFeatures(Xtest, p)\n",
+    "X_poly_test -= mu\n",
+    "X_poly_test /= sigma\n",
+    "X_poly_test = np.concatenate([np.ones((ytest.size, 1)), X_poly_test], axis=1)\n",
+    "\n",
+    "# Map X_poly_val and normalize (using mu and sigma)\n",
+    "X_poly_val = polyFeatures(Xval, p)\n",
+    "X_poly_val -= mu\n",
+    "X_poly_val /= sigma\n",
+    "X_poly_val = np.concatenate([np.ones((yval.size, 1)), X_poly_val], axis=1)\n",
+    "\n",
+    "print('Normalized Training Example 1:')\n",
+    "X_poly[0, :]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[4] = polyFeatures\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.1 Learning Polynomial Regression\n",
+    "\n",
+    "After you have completed the function `polyFeatures`, we will proceed to train polynomial regression using your linear regression cost function.\n",
+    "\n",
+    "Keep in mind that even though we have polynomial terms in our feature vector, we are still solving a linear regression optimization problem. The polynomial terms have simply turned into features that we can use for linear regression. We are using the same cost function and gradient that you wrote for the earlier part of this exercise.\n",
+    "\n",
+    "For this part of the exercise, you will be using a polynomial of degree 8. It turns out that if we run the training directly on the projected data, will not work well as the features would be badly scaled (e.g., an example with $x = 40$ will now have a feature $x_8 = 40^8 = 6.5 \\times 10^{12}$). Therefore, you will\n",
+    "need to use feature normalization.\n",
+    "\n",
+    "Before learning the parameters $\\theta$ for the polynomial regression, we first call `featureNormalize` and normalize the features of the training set, storing the mu, sigma parameters separately. We have already implemented this function for you (in `utils.py` module) and it is the same function from the first exercise.\n",
+    "\n",
+    "After learning the parameters $\\theta$, you should see two plots generated for polynomial regression with $\\lambda = 0$, which should be similar to the ones here:\n",
+    "\n",
+    "<table>\n",
+    "    <tr>\n",
+    "        <td><img src=\"Figures/polynomial_regression.png\"></td>\n",
+    "        <td><img src=\"Figures/polynomial_learning_curve.png\"></td>\n",
+    "    </tr>\n",
+    "</table>\n",
+    "\n",
+    "You should see that the polynomial fit is able to follow the datapoints very well, thus, obtaining a low training error. The figure on the right shows that the training error essentially stays zero for all numbers of training samples. However, the polynomial fit is very complex and even drops off at the extremes. This is an indicator that the polynomial regression model is overfitting the training data and will not generalize well.\n",
+    "\n",
+    "To better understand the problems with the unregularized ($\\lambda = 0$) model, you can see that the learning curve  shows the same effect where the training error is low, but the cross validation error is high. There is a gap between the training and cross validation errors, indicating a high variance problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lambda_ = 100\n",
+    "theta = utils.trainLinearReg(linearRegCostFunction, X_poly, y,\n",
+    "                             lambda_=lambda_, maxiter=55)\n",
+    "\n",
+    "# Plot training data and fit\n",
+    "pyplot.plot(X, y, 'ro', ms=10, mew=1.5, mec='k')\n",
+    "\n",
+    "utils.plotFit(polyFeatures, np.min(X), np.max(X), mu, sigma, theta, p)\n",
+    "\n",
+    "pyplot.xlabel('Change in water level (x)')\n",
+    "pyplot.ylabel('Water flowing out of the dam (y)')\n",
+    "pyplot.title('Polynomial Regression Fit (lambda = %f)' % lambda_)\n",
+    "pyplot.ylim([-20, 50])\n",
+    "\n",
+    "pyplot.figure()\n",
+    "error_train, error_val = learningCurve(X_poly, y, X_poly_val, yval, lambda_)\n",
+    "pyplot.plot(np.arange(1, 1+m), error_train, np.arange(1, 1+m), error_val)\n",
+    "\n",
+    "pyplot.title('Polynomial Regression Learning Curve (lambda = %f)' % lambda_)\n",
+    "pyplot.xlabel('Number of training examples')\n",
+    "pyplot.ylabel('Error')\n",
+    "pyplot.axis([0, 13, 0, 100])\n",
+    "pyplot.legend(['Train', 'Cross Validation'])\n",
+    "\n",
+    "print('Polynomial Regression (lambda = %f)\\n' % lambda_)\n",
+    "print('# Training Examples\\tTrain Error\\tCross Validation Error')\n",
+    "for i in range(m):\n",
+    "    print('  \\t%d\\t\\t%f\\t%f' % (i+1, error_train[i], error_val[i]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "One way to combat the overfitting (high-variance) problem is to add regularization to the model. In the next section, you will get to  try different $\\lambda$ parameters to see how regularization can lead to a better model.\n",
+    "\n",
+    "### 3.2 Optional (ungraded) exercise: Adjusting the regularization parameter\n",
+    "\n",
+    "In this section, you will get to observe how the regularization parameter affects the bias-variance of regularized polynomial regression. You should now modify the the lambda parameter and try $\\lambda = 1, 100$. For each of these values, the script should generate a polynomial fit to the data and also a learning curve.\n",
+    "\n",
+    "For $\\lambda = 1$, the generated plots should look like the the figure below. You should see a polynomial fit that follows the data trend well (left) and a learning curve (right) showing that both the cross validation and training error converge to a relatively low value. This shows the $\\lambda = 1$ regularized polynomial regression model does not have the high-bias or high-variance problems. In effect, it achieves a good trade-off between bias and variance.\n",
+    "\n",
+    "<table>\n",
+    "    <tr>\n",
+    "        <td><img src=\"Figures/polynomial_regression_reg_1.png\"></td>\n",
+    "        <td><img src=\"Figures/polynomial_learning_curve_reg_1.png\"></td>\n",
+    "    </tr>\n",
+    "</table>\n",
+    "\n",
+    "For $\\lambda = 100$, you should see a polynomial fit (figure below) that does not follow the data well. In this case, there is too much regularization and the model is unable to fit the training data.\n",
+    "\n",
+    "![](Figures/polynomial_regression_reg_100.png)\n",
+    "\n",
+    "*You do not need to submit any solutions for this optional (ungraded) exercise.*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"section5\"></a>\n",
+    "### 3.3 Selecting $\\lambda$ using a cross validation set\n",
+    "\n",
+    "From the previous parts of the exercise, you observed that the value of $\\lambda$ can significantly affect the results of regularized polynomial regression on the training and cross validation set. In particular, a model without regularization ($\\lambda = 0$) fits the training set well, but does not generalize. Conversely, a model with too much regularization ($\\lambda = 100$) does not fit the training set and testing set well. A good choice of $\\lambda$ (e.g., $\\lambda = 1$) can provide a good fit to the data.\n",
+    "\n",
+    "In this section, you will implement an automated method to select the $\\lambda$ parameter. Concretely, you will use a cross validation set to evaluate how good each $\\lambda$ value is. After selecting the best $\\lambda$ value using the cross validation set, we can then evaluate the model on the test set to estimate\n",
+    "how well the model will perform on actual unseen data. \n",
+    "\n",
+    "Your task is to complete the code in the function `validationCurve`. Specifically, you should should use the `utils.trainLinearReg` function to train the model using different values of $\\lambda$ and compute the training error and cross validation error. You should try $\\lambda$ in the following range: {0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10}.\n",
+    "<a id=\"validationCurve\"></a>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def validationCurve(X, y, Xval, yval):\n",
+    "    \"\"\"\n",
+    "    Generate the train and validation errors needed to plot a validation\n",
+    "    curve that we can use to select lambda_.\n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    X : array_like\n",
+    "        The training dataset. Matrix with shape (m x n) where m is the \n",
+    "        total number of training examples, and n is the number of features \n",
+    "        including any polynomial features.\n",
+    "    \n",
+    "    y : array_like\n",
+    "        The functions values at each training datapoint. A vector of\n",
+    "        shape (m, ).\n",
+    "    \n",
+    "    Xval : array_like\n",
+    "        The validation dataset. Matrix with shape (m_val x n) where m is the \n",
+    "        total number of validation examples, and n is the number of features \n",
+    "        including any polynomial features.\n",
+    "    \n",
+    "    yval : array_like\n",
+    "        The functions values at each validation datapoint. A vector of\n",
+    "        shape (m_val, ).\n",
+    "    \n",
+    "    Returns\n",
+    "    -------\n",
+    "    lambda_vec : list\n",
+    "        The values of the regularization parameters which were used in \n",
+    "        cross validation.\n",
+    "    \n",
+    "    error_train : list\n",
+    "        The training error computed at each value for the regularization\n",
+    "        parameter.\n",
+    "    \n",
+    "    error_val : list\n",
+    "        The validation error computed at each value for the regularization\n",
+    "        parameter.\n",
+    "    \n",
+    "    Instructions\n",
+    "    ------------\n",
+    "    Fill in this function to return training errors in `error_train` and\n",
+    "    the validation errors in `error_val`. The vector `lambda_vec` contains\n",
+    "    the different lambda parameters to use for each calculation of the\n",
+    "    errors, i.e, `error_train[i]`, and `error_val[i]` should give you the\n",
+    "    errors obtained after training with `lambda_ = lambda_vec[i]`.\n",
+    "\n",
+    "    Note\n",
+    "    ----\n",
+    "    You can loop over lambda_vec with the following:\n",
+    "    \n",
+    "          for i in range(len(lambda_vec))\n",
+    "              lambda = lambda_vec[i]\n",
+    "              # Compute train / val errors when training linear \n",
+    "              # regression with regularization parameter lambda_\n",
+    "              # You should store the result in error_train[i]\n",
+    "              # and error_val[i]\n",
+    "              ....\n",
+    "    \"\"\"\n",
+    "    # Selected values of lambda (you should not change this)\n",
+    "    lambda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]\n",
+    "\n",
+    "    # You need to return these variables correctly.\n",
+    "    error_train = np.zeros(len(lambda_vec))\n",
+    "    error_val = np.zeros(len(lambda_vec))\n",
+    "\n",
+    "    # ====================== YOUR CODE HERE ======================\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # ============================================================\n",
+    "    return lambda_vec, error_train, error_val"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After you have completed the code, the next cell will run your function and plot a cross validation curve of error v.s. $\\lambda$ that allows you select which $\\lambda$ parameter to use. You should see a plot similar to the figure below. \n",
+    "\n",
+    "![](Figures/cross_validation.png)\n",
+    "\n",
+    "In this figure, we can see that the best value of $\\lambda$ is around 3. Due to randomness\n",
+    "in the training and validation splits of the dataset, the cross validation error can sometimes be lower than the training error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lambda_vec, error_train, error_val = validationCurve(X_poly, y, X_poly_val, yval)\n",
+    "\n",
+    "pyplot.plot(lambda_vec, error_train, '-o', lambda_vec, error_val, '-o', lw=2)\n",
+    "pyplot.legend(['Train', 'Cross Validation'])\n",
+    "pyplot.xlabel('lambda')\n",
+    "pyplot.ylabel('Error')\n",
+    "\n",
+    "print('lambda\\t\\tTrain Error\\tValidation Error')\n",
+    "for i in range(len(lambda_vec)):\n",
+    "    print(' %f\\t%f\\t%f' % (lambda_vec[i], error_train[i], error_val[i]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*You should now submit your solutions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grader[5] = validationCurve\n",
+    "grader.grade()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.4  Optional (ungraded) exercise: Computing test set error\n",
+    "\n",
+    "In the previous part of the exercise, you implemented code to compute the cross validation error for various values of the regularization parameter $\\lambda$. However, to get a better indication of the model’s performance in the real world, it is important to evaluate the “final” model on a test set that was not used in any part of training (that is, it was neither used to select the $\\lambda$ parameters, nor to learn the model parameters $\\theta$). For this optional (ungraded) exercise, you should compute the test error using the best value of $\\lambda$ you found. In our cross validation, we obtained a test error of 3.8599 for $\\lambda = 3$.\n",
+    "\n",
+    "*You do not need to submit any solutions for this optional (ungraded) exercise.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.5 Optional (ungraded) exercise: Plotting learning curves with randomly selected examples\n",
+    "\n",
+    "In practice, especially for small training sets, when you plot learning curves to debug your algorithms, it is often helpful to average across multiple sets of randomly selected examples to determine the training error and cross validation error.\n",
+    "\n",
+    "Concretely, to determine the training error and cross validation error for $i$ examples, you should first randomly select $i$ examples from the training set and $i$ examples from the cross validation set. You will then learn the parameters $\\theta$ using the randomly chosen training set and evaluate the parameters $\\theta$ on the randomly chosen training set and cross validation set. The above steps should then be repeated multiple times (say 50) and the averaged error should be used to determine the training error and cross validation error for $i$ examples.\n",
+    "\n",
+    "For this optional (ungraded) exercise, you should implement the above strategy for computing the learning curves. For reference, the figure below  shows the learning curve we obtained for polynomial regression with $\\lambda = 0.01$. Your figure may differ slightly due to the random selection of examples.\n",
+    "\n",
+    "![](Figures/learning_curve_random.png)\n",
+    "\n",
+    "*You do not need to submit any solutions for this optional (ungraded) exercise.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/utils.py b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/utils.py
new file mode 100644
index 000000000..b2340ad78
--- /dev/null
+++ b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/Exercise5/utils.py	
@@ -0,0 +1,164 @@
+import sys
+import numpy as np
+from scipy import optimize
+from matplotlib import pyplot
+
+sys.path.append('..')
+from submission import SubmissionBase
+
+
+def trainLinearReg(linearRegCostFunction, X, y, lambda_=0.0, maxiter=200):
+    """
+    Trains linear regression using scipy's optimize.minimize.
+
+    Parameters
+    ----------
+    X : array_like
+        The dataset with shape (m x n+1). The bias term is assumed to be concatenated.
+
+    y : array_like
+        Function values at each datapoint. A vector of shape (m,).
+
+    lambda_ : float, optional
+        The regularization parameter.
+
+    maxiter : int, optional
+        Maximum number of iteration for the optimization algorithm.
+
+    Returns
+    -------
+    theta : array_like
+        The parameters for linear regression. This is a vector of shape (n+1,).
+    """
+    # Initialize Theta
+    initial_theta = np.zeros(X.shape[1])
+
+    # Create "short hand" for the cost function to be minimized
+    costFunction = lambda t: linearRegCostFunction(X, y, t, lambda_)
+
+    # Now, costFunction is a function that takes in only one argument
+    options = {'maxiter': maxiter}
+
+    # Minimize using scipy
+    res = optimize.minimize(costFunction, initial_theta, jac=True, method='TNC', options=options)
+    return res.x
+
+
+def featureNormalize(X):
+    """
+    Normalizes the features in X returns a normalized version of X where the mean value of each
+    feature is 0 and the standard deviation is 1. This is often a good preprocessing step to do when
+    working with learning algorithms.
+
+    Parameters
+    ----------
+    X : array_like
+        An dataset which is a (m x n) matrix, where m is the number of examples,
+        and n is the number of dimensions for each example.
+
+    Returns
+    -------
+    X_norm : array_like
+        The normalized input dataset.
+
+    mu : array_like
+        A vector of size n corresponding to the mean for each dimension across all examples.
+
+    sigma : array_like
+        A vector of size n corresponding to the standard deviations for each dimension across
+        all examples.
+    """
+    mu = np.mean(X, axis=0)
+    X_norm = X - mu
+
+    sigma = np.std(X_norm, axis=0, ddof=1)
+    X_norm /= sigma
+    return X_norm, mu, sigma
+
+
+def plotFit(polyFeatures, min_x, max_x, mu, sigma, theta, p):
+    """
+    Plots a learned polynomial regression fit over an existing figure.
+    Also works with linear regression.
+    Plots the learned polynomial fit with power p and feature normalization (mu, sigma).
+
+    Parameters
+    ----------
+    polyFeatures : func
+        A function which generators polynomial features from a single feature.
+
+    min_x : float
+        The minimum value for the feature.
+
+    max_x : float
+        The maximum value for the feature.
+
+    mu : float
+        The mean feature value over the training dataset.
+
+    sigma : float
+        The feature standard deviation of the training dataset.
+
+    theta : array_like
+        The parameters for the trained polynomial linear regression.
+
+    p : int
+        The polynomial order.
+    """
+    # We plot a range slightly bigger than the min and max values to get
+    # an idea of how the fit will vary outside the range of the data points
+    x = np.arange(min_x - 15, max_x + 25, 0.05).reshape(-1, 1)
+
+    # Map the X values
+    X_poly = polyFeatures(x, p)
+    X_poly -= mu
+    X_poly /= sigma
+
+    # Add ones
+    X_poly = np.concatenate([np.ones((x.shape[0], 1)), X_poly], axis=1)
+
+    # Plot
+    pyplot.plot(x, np.dot(X_poly, theta), '--', lw=2)
+
+
+class Grader(SubmissionBase):
+    # Random test cases
+    X = np.vstack([np.ones(10),
+                   np.sin(np.arange(1, 15, 1.5)),
+                   np.cos(np.arange(1, 15, 1.5))]).T
+    y = np.sin(np.arange(1, 31, 3))
+    Xval = np.vstack([np.ones(10),
+                      np.sin(np.arange(0, 14, 1.5)),
+                      np.cos(np.arange(0, 14, 1.5))]).T
+    yval = np.sin(np.arange(1, 11))
+
+    def __init__(self):
+        part_names = ['Regularized Linear Regression Cost Function',
+                      'Regularized Linear Regression Gradient',
+                      'Learning Curve',
+                      'Polynomial Feature Mapping',
+                      'Validation Curve']
+        super().__init__('regularized-linear-regression-and-bias-variance', part_names)
+
+    def __iter__(self):
+        for part_id in range(1, 6):
+            try:
+                func = self.functions[part_id]
+                # Each part has different expected arguments/different function
+                if part_id == 1:
+                    res = func(self.X, self.y, np.array([0.1, 0.2, 0.3]), 0.5)
+                elif part_id == 2:
+                    theta = np.array([0.1, 0.2, 0.3])
+                    res = func(self.X, self.y, theta, 0.5)[1]
+                elif part_id == 3:
+                    res = np.hstack(func(self.X, self.y, self.Xval, self.yval, 1)).tolist()
+                elif part_id == 4:
+                    res = func(self.X[1, :].reshape(-1, 1), 8)
+                elif part_id == 5:
+                    res = np.hstack(func(self.X, self.y, self.Xval, self.yval)).tolist()
+                else:
+                    raise KeyError
+            except KeyError:
+                yield part_id, 0
+            yield part_id, res
+
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/README.md b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/README.md
new file mode 100644
index 000000000..b0c8fefad
--- /dev/null
+++ b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/README.md	
@@ -0,0 +1,10 @@
+## To be done for next week
+
+1. Complete All Video Lectures and Quizes of week5 and week6 of the [Coursera Course](https://www.coursera.org/learn/machine-learning).
+2. Complete the ipython notebook for both exercise4 and exercise5 provided(nothing is optional and the points for each part is mentioned in the notebook).
+3. Although this week has the best explanation of backpropagation available on internet you can take help from [here](https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c) as a supplementry material.
+4. For python implementation of backpropagation you can get help from [here](https://medium.com/analytics-vidhya/neural-networks-for-digits-recognition-e11d9dff00d5).
+5. Add your python(if you are doing by your own) or jupyter notebook files and create pull request.
+6. Each part of the assignment has equal weightage - total 100 points.
+7. Complete assignment in octave(for the sake of your certification, it is optional and does not consist of any point).
+8. Please try to submit the ipynb instead of python file so that we can check the result directly. You can use magic tool like `%matplotlib inline` in order to plot the graph directly in the notebook.
diff --git a/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/submission.py b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/submission.py
new file mode 100644
index 000000000..10113e47d
--- /dev/null
+++ b/Phase 3 - 2020 (Summer)/Week 5(Apr 26-May 02)/submission.py	
@@ -0,0 +1,105 @@
+from urllib.parse import urlencode
+from urllib.request import urlopen
+import pickle
+import json
+from collections import OrderedDict
+import numpy as np
+import os
+
+
+class SubmissionBase:
+
+    submit_url = 'https://www-origin.coursera.org/api/' \
+                 'onDemandProgrammingImmediateFormSubmissions.v1'
+    save_file = 'token.pkl'
+
+    def __init__(self, assignment_slug, part_names):
+        self.assignment_slug = assignment_slug
+        self.part_names = part_names
+        self.login = None
+        self.token = None
+        self.functions = OrderedDict()
+        self.args = dict()
+
+    def grade(self):
+        print('\nSubmitting Solutions | Programming Exercise %s\n' % self.assignment_slug)
+        self.login_prompt()
+
+        # Evaluate the different parts of exercise
+        parts = OrderedDict()
+        for part_id, result in self:
+            parts[str(part_id)] = {'output': sprintf('%0.5f ', result)}
+        result, response = self.request(parts)
+        response = json.loads(response.decode("utf-8"))
+
+        # if an error was returned, print it and stop
+        if 'errorMessage' in response:
+            print(response['errorMessage'])
+            return
+
+        # Print the grading table
+        print('%43s | %9s | %-s' % ('Part Name', 'Score', 'Feedback'))
+        print('%43s | %9s | %-s' % ('---------', '-----', '--------'))
+        for part in parts:
+            part_feedback = response['partFeedbacks'][part]
+            part_evaluation = response['partEvaluations'][part]
+            score = '%d / %3d' % (part_evaluation['score'], part_evaluation['maxScore'])
+            print('%43s | %9s | %-s' % (self.part_names[int(part) - 1], score, part_feedback))
+        evaluation = response['evaluation']
+        total_score = '%d / %d' % (evaluation['score'], evaluation['maxScore'])
+        print('                                  --------------------------------')
+        print('%43s | %9s | %-s\n' % (' ', total_score, ' '))
+
+    def login_prompt(self):
+        if os.path.isfile(self.save_file):
+            with open(self.save_file, 'rb') as f:
+                login, token = pickle.load(f)
+            reenter = input('Use token from last successful submission (%s)? (Y/n): ' % login)
+
+            if reenter == '' or reenter[0] == 'Y' or reenter[0] == 'y':
+                self.login, self.token = login, token
+                return
+            else:
+                os.remove(self.save_file)
+
+        self.login = input('Login (email address): ')
+        self.token = input('Token: ')
+
+        # Save the entered credentials
+        if not os.path.isfile(self.save_file):
+            with open(self.save_file, 'wb') as f:
+                pickle.dump((self.login, self.token), f)
+
+    def request(self, parts):
+        params = {
+            'assignmentSlug': self.assignment_slug,
+            'secret': self.token,
+            'parts': parts,
+            'submitterEmail': self.login}
+
+        params = urlencode({'jsonBody': json.dumps(params)}).encode("utf-8")
+        f = urlopen(self.submit_url, params)
+        try:
+            return 0, f.read()
+        finally:
+            f.close()
+
+    def __iter__(self):
+        for part_id in self.functions:
+            yield part_id
+
+    def __setitem__(self, key, value):
+        self.functions[key] = value
+
+
+def sprintf(fmt, arg):
+    """ Emulates (part of) Octave sprintf function. """
+    if isinstance(arg, tuple):
+        # for multiple return values, only use the first one
+        arg = arg[0]
+
+    if isinstance(arg, (np.ndarray, list)):
+        # concatenates all elements, column by column
+        return ' '.join(fmt % e for e in np.asarray(arg).ravel('F'))
+    else:
+        return fmt % arg