Skip to content

Multivariate Linear Regression

Ram Rathi edited this page Dec 12, 2018 · 1 revision

What is multivariate linear regression and why do we need it?

Till now we've seen models where we use only one feature to determind the output or prediction. But in the real world, a prediction or event will usually depend on more than one factors. This is where multivariate linear regression comes in. Here, our output prediction will be made based on multiple input feautres. For example, to predict the weight of a person, we could use multiple features such as the age, sex, height, etc.

Multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables.

It's highly recommended you watch these two videos before moving forward:
[4.1] https://www.youtube.com/watch?v=Q4GNLhRtZNc
[4.2] https://www.youtube.com/watch?v=pkJjoro-b5c


Parameters and Shapes

So instaed of having just one weight like we did in simple linear regression, we now will need multiple weights for each feature Our equation for 3 features x1, x2, x3 will now look like

h(x) = w1.x1 + w2.x2 + w3.x3 + b

Here we can represent W and X as matrices.

W = np.zeros(shape=(1, 3), dtype=np.float32) 	#Setting the values of W to a matrice of 0's   
b = 0						#Setting our bias value to 0

Here the shape of W is (1,3) as we have 3 features (x1, x2, x3). This means the shape of X will be (3,1) Thus when we apply the formula to find our prediction-

h(x) = WTX + b

WTX will be of the form (1,3)*(3,1) = (1,1), Which is corresponding to a single number and not a matrice. Looking carefully at this, you'll realise that as WTX and b are single digits, h(x) will also be a single digit and can then be compared to our Y (real value).

Z= np.matmul(W.T, X) + b                
#Where Z is h(x) and w.T will give us the transpose of our matrix W

Issues and problems we need to look out for

Before we dive into the maths behind this, lets look at somethings which we should keep an eye out for while implementing Mulitvariate linear regression.

Check linearity with features

We should make sure that we are including only those features which are actually related to the output. For example, don't go ahead and put the name of the person as a feature while trying to predict his/her weight. They're completely unrelated.
(Or are they? Is this a special hint that you'll need later? What?)

Check colinearity between the features

We also have to make sure that no two features are related to each other. For example, if one of our features was the persons height, and the other was the length of their legs, it wouldn't make sense as obviously someone with a greater height would have longer legs. Thus one of these features becomes redundant.

Training and self correction

The optmization is very similiar as normal linear regression. The only difference is that instead of one weight, we'll be implementing it on every weight, and hence the whole matrix.

Numpy lets us easily implement this by automatically implementing the calculation of dw to each element in the matrice. Have a look at the functions to help you out-

dW = np.matmul(X, np.transpose(Z - Y)) * (1.0/m)
db = np.mean((Z - Y))

Note that here, dW is a matrice, a collection of different dw values, for each weight in the system. You can imaging dW as
[[dw1,dw2,dw3]]

Whereas db is just a single number, as how many ever weights we have, we use only 1 bias variable.

The next step is to update our parameters. Again, numpy does the work for us. Instead of having to write 3 equations such as-

w1 = w1 - learning_rate * dw1
w2 = w2 - learning_rate * dw2
w3 = w3 - learning_rate * dw3

Numpy can just directly fix this in one equation by writing

W = W - learning_rate * dW
b = b - learning_rate * db

Note Hope you are looking at the capitalization of the variables. Anything in caps means its a matrice, for eg. dW and W are matrices, whereas dw and w are just single digit weights.
We don't need to care about b and B as both are always going to be single digits, same applies to db and dB. They are interchangable.


Thats pretty much the most important parts of the code you'll be needing. To recap, we know how to make a prediction and how to optimize our parameters.
What we are leaving to the reader is to join these methods and figure out how to loop these so that they can cover multiple test cases and also inputing a dataset (which was taught last week with pandas). Although this may look difficult at first, but if you've understood all this perfectly, it isn't that hard.
Make sure to keep trying, and if you get stuck somewhere, there are plenty of resources online. Go through documentations, videos, articles, etc. This is another important skill its high time you'll should be learning by now. The quality and amount of work on ML you can find online is amazing.


Links you should check out -

  1. https://www.youtube.com/watch?v=dQNpSa-bq4M [Extremely detailed, may skip some parts not given in this doc]
  2. https://www.youtube.com/watch?v=K_EH2abOp00
  3. The 2 videos linked above are a must watch

Note Some videos may show different notations or even a bit different formulas. Focus on the overall theory more for now, and not on these small changes.