Neural Network From Scratch following Sentdex.
Attempt to construct a complete nueral network using simple python and numpy operations, that can help me to solidify the core ideas of a neural network.
A Neural network consists of multiple layers of neuron, each neuron is connected to every other neuron in the next layers,via edges.
Output of every neuron works as an input for the next one. Each edge carries a "weight", which is multiplied to the input. Output of a neuron is generated by adding the product of inputs and weights from every neuron of the pervious layer to the "bias" of that neuron.
Weights and Biases are first randomly selected and are tuned by the optimiser in order to fit any dataset. They have different impacts on the final output of a neuron.
We will start with some hard coded values for inputs, weights and baises.
inputs = [1, 2, 3, 2.5]
weights = [
[ 0.2, 0.8, -0.5, 1 ],
[ 0.5, -0.91, 0.26, -0.5 ],
[ -0.26, -0.27, 0.17, 0.87 ]
]
biases = [2, 3, 0.5]
Applying the simple logic, output_of_neuron = Sum( all_inputs * respective_weights) + neuron_bias
layer1_output = []
for (neuron_bias, neuron_weights) in zip(biases, weights):
neuron_output = 0
for (n_input, weight) in zip(inputs, neuron_weights):
neuron_output = neuron_output + (n_input * weight)
layer1_output.append(neuron_output+neuron_bias)
print(layer1_output)
Using Numpy Dot product for the same logic
layer1_output = np.dot(weights, inputs) + biases
Ouput:
>>> [4.8 1.21 2.385]
Instead of using a single input vector, we can use batches, where each batch contains multiple input vectors(multiple instances). This helps in providing generalisation to the optimiser and also we can make use of multiple cores in the system during training.
X = [[1, 2, 3, 2.5],
[2.0, 5.0, -1.0, 2.0],
[-1.5, 2.7, 3.3, -0.8]]
layer1_output = np.dot(X, np.array(weights).T) + biases
Here we can see that instead of using a single input vector, we use a matrix containing 3 input instances. There will be no change in weights and biases but when we will calculate the dot product, we have to use "Transpose". This is done because in the current senerio, we have a weights matrix of shape (3,4) and input matrix is also (3,4). Matrix multiplication can be applied only if the no. of cols of first matrix is same as the no. of rows of the second one. Hence, transpose is applied.
To add another layer we will need new set of weights and biases. Also the output of previous layer acts as input for the new layer.
weights2 = [[ 0.1, -0.14, 0.5 ],
[ -0.5, 0.12, -0.33 ],
[ -0.44, 0.73, -0.13 ]]
biases2 = [-1, 2, -0.5]
layer2_output = np.dot(layer1_output, np.array(weights2).T) + biases2
Note that numb of col. in weights of layer1 and layer2 are different, 4 and 3 repectively. This is because we assume that layer0 has 4 neurons(therefore 4 input values) and layer1 and layer2 has 3 neurons each.
To provide dynamicity to the processes of constructing a layer we can use "objects" instead of hard coding weights and biases for every individual layer.
class Layer_Dense:
def __init__(self, n_inputs, n_neurons):
self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
def forward(self, inputs):
self.output = np.dot(inputs, self.weights) + self.biases
Here,
- n_inputs - the number of inputs or the number of neurons in the pervious layer
- n_neurons - the number of neurons in the current layer
Note: The weights are initialised with random value of shape (n_inputs, n_neurons), this is transposed verion of the weights matrix that we were using in the pervious version, so that we dont have to transpose it during dot product. Smaller weights in a neural network can result in a model that is more stable and less likely to overfit the training dataset, hence we scale the random number to a range of -1 to 1. Bias value is initialised with 0, but in some cases this might lead to a "dead network" in which case we can initialise it with some non zero value.
To create layers using the class Layer_Dense
layer1 = Layer_Dense(4, 5) # output dimension can be any number
layer2 = Layer_Dense(5, 2) # Note that dimension of output of pervious layer matches input of this layer
layer1.forward(X)
layer2.forward(layer1.output)
print(layer2.output)
Why do we use activation function?
If we use only weights and biases, our effective activation function will be a linear activation, which will not in work in case of non linear data. Hence, non linear activation functions such as sigmod, relu etc are used.
-
Step Function - Result is either 0 or 1 (Not used)
-
Sigmod Function - provides granularity (Not Used)
-
ReLU - It addes granularity and is simple and fast. Widely used for hidden layers.
y = x if x > 0 else 0
class Activation_ReLU:
def forward(self, inputs):
self.output = np.maximum(0, inputs)
Spiral Data file was added to base repo containing spiral_data function
def spiral_data(points, classes):
X = np.zeros((points*classes, 2))
y = np.zeros(points*classes, dtype='uint8')
for class_number in range(classes):
ix = range(points*class_number, points*(class_number+1))
r = np.linspace(0.0, 1, points) # radius
t = np.linspace(class_number*4, (class_number+1)*4, points) + np.random.randn(points)*0.2
X[ix] = np.c_[r*np.sin(t*2.5), r*np.cos(t*2.5)]
y[ix] = class_number
return X, y
This function will be used to create a dataset, instead of the hard coded values we were using up till now. The function returns data points on a 2D plane that creates a spiral when ploted (data has 2 features, i.e. x and y coordinate). Parameters: "points" corresponds to the number of points, "classes" corresponds to the number of class in the data.