Skip to content

kumaraayush67/NNFS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NNFS

Neural Network From Scratch following Sentdex.

Attempt to construct a complete nueral network using simple python and numpy operations, that can help me to solidify the core ideas of a neural network.

Zeroth Step - Understanding basic structure of a Neural Network

A Neural network consists of multiple layers of neuron, each neuron is connected to every other neuron in the next layers,via edges.

Output of every neuron works as an input for the next one. Each edge carries a "weight", which is multiplied to the input. Output of a neuron is generated by adding the product of inputs and weights from every neuron of the pervious layer to the "bias" of that neuron.

Weights and Biases are first randomly selected and are tuned by the optimiser in order to fit any dataset. They have different impacts on the final output of a neuron.

First Step - Creating Dummy Data

We will start with some hard coded values for inputs, weights and baises.

inputs = [1, 2, 3, 2.5]
weights = [
        [ 0.2, 0.8, -0.5, 1 ],
        [ 0.5, -0.91, 0.26, -0.5 ],
        [ -0.26, -0.27, 0.17, 0.87 ]
    ]
biases = [2, 3, 0.5]

Second Step - Output logic for a layer in Neural Network

Applying the simple logic, output_of_neuron = Sum( all_inputs * respective_weights) + neuron_bias

layer1_output = []
for (neuron_bias, neuron_weights) in zip(biases, weights):
    neuron_output = 0
    for (n_input, weight) in zip(inputs, neuron_weights): 
        neuron_output = neuron_output + (n_input * weight)
    layer1_output.append(neuron_output+neuron_bias)

print(layer1_output)

Using Numpy Dot product for the same logic

layer1_output = np.dot(weights, inputs) + biases

Ouput:

>>> [4.8   1.21  2.385]

Third Step - Using Batches

Instead of using a single input vector, we can use batches, where each batch contains multiple input vectors(multiple instances). This helps in providing generalisation to the optimiser and also we can make use of multiple cores in the system during training.

X = [[1, 2, 3, 2.5],
     [2.0, 5.0, -1.0, 2.0],
     [-1.5, 2.7, 3.3, -0.8]]
layer1_output = np.dot(X, np.array(weights).T) + biases

Here we can see that instead of using a single input vector, we use a matrix containing 3 input instances. There will be no change in weights and biases but when we will calculate the dot product, we have to use "Transpose". This is done because in the current senerio, we have a weights matrix of shape (3,4) and input matrix is also (3,4). Matrix multiplication can be applied only if the no. of cols of first matrix is same as the no. of rows of the second one. Hence, transpose is applied.

Fourth Step - Multiple Layers

To add another layer we will need new set of weights and biases. Also the output of previous layer acts as input for the new layer.

weights2 = [[ 0.1, -0.14, 0.5 ],
           [ -0.5, 0.12, -0.33 ],
           [ -0.44, 0.73, -0.13 ]]
biases2 = [-1, 2, -0.5]
layer2_output = np.dot(layer1_output, np.array(weights2).T) + biases2

Note that numb of col. in weights of layer1 and layer2 are different, 4 and 3 repectively. This is because we assume that layer0 has 4 neurons(therefore 4 input values) and layer1 and layer2 has 3 neurons each.

Using Objects to apply multiple layers.

To provide dynamicity to the processes of constructing a layer we can use "objects" instead of hard coding weights and biases for every individual layer.

class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases

Here,

  • n_inputs - the number of inputs or the number of neurons in the pervious layer
  • n_neurons - the number of neurons in the current layer

Note: The weights are initialised with random value of shape (n_inputs, n_neurons), this is transposed verion of the weights matrix that we were using in the pervious version, so that we dont have to transpose it during dot product. Smaller weights in a neural network can result in a model that is more stable and less likely to overfit the training dataset, hence we scale the random number to a range of -1 to 1. Bias value is initialised with 0, but in some cases this might lead to a "dead network" in which case we can initialise it with some non zero value.

To create layers using the class Layer_Dense

layer1 = Layer_Dense(4, 5) # output dimension can be any number
layer2 = Layer_Dense(5, 2) # Note that dimension of output of pervious layer matches input of this layer
layer1.forward(X)
layer2.forward(layer1.output)
print(layer2.output)

Five Step - Activation Function

Why do we use activation function?

If we use only weights and biases, our effective activation function will be a linear activation, which will not in work in case of non linear data. Hence, non linear activation functions such as sigmod, relu etc are used.

  • Step Function - Result is either 0 or 1 (Not used)

  • Sigmod Function - provides granularity (Not Used)

  • ReLU - It addes granularity and is simple and fast. Widely used for hidden layers.

    y = x if x > 0 else 0

class Activation_ReLU:
    def forward(self, inputs):
        self.output = np.maximum(0, inputs)

Spiral Data File

Spiral Data file was added to base repo containing spiral_data function

def spiral_data(points, classes):
    X = np.zeros((points*classes, 2))
    y = np.zeros(points*classes, dtype='uint8')
    for class_number in range(classes):
        ix = range(points*class_number, points*(class_number+1))
        r = np.linspace(0.0, 1, points)  # radius
        t = np.linspace(class_number*4, (class_number+1)*4, points) + np.random.randn(points)*0.2
        X[ix] = np.c_[r*np.sin(t*2.5), r*np.cos(t*2.5)]
        y[ix] = class_number
    return X, y

This function will be used to create a dataset, instead of the hard coded values we were using up till now. The function returns data points on a 2D plane that creates a spiral when ploted (data has 2 features, i.e. x and y coordinate). Parameters: "points" corresponds to the number of points, "classes" corresponds to the number of class in the data.

About

Neural Network From Scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages