Neural Networks are one of the most powerful algorithms in Machine Learning. They work on large datasets and provide excellent results in certain cases. As a result, Neural Networks are being widely used these days for performing various tasks like:

- Voice recognition
- Face recognition
- Text identification

In the past, it was difficult to train neural networks simply because hardware was quite expensive. Today, with the recent advancements in hardware, GPUs, and cloud, it has become quite easy to train complicated neural networks. In this blog post, we will study how to train a simple neural network in one of the simplest problems – spam email classification.

## Problem Statement

The problem statement is as follows – given an email, classify it is a spam email or a normal email (not spam). We are also given a labeled dataset of emails where each email is labeled as 1 (spam) or 0 (no-spam). Let us first see an example of a spam email:

*Dear Mr. Joe, Congratulations!!! You’ve won a lottery prize of $1M. To claim, kindly revert with your bank account and debit card details!*

Clearly, the above email is a fraudulent email and is supposed to be a spam. Now, we need to train an algorithm (neural network), which can learn from the labeled dataset we have and can then label new emails as spam or no-spam. You can find a spam dataset here. Dataset description can be found here. As can you see, there are 57 features in each email that have been extracted from the dataset. These features include the frequency of certain words that represent a spam email. The dataset also measures the frequency of certain characters that are key indicators of a spam email.

Now, let us create a 3 layer neural network on this dataset. The architecture can be seen below:

As mentioned, there are 3 layers:

- Layer 1: This is the input layer and contains 57 nodes corresponding to the size of the input vector.
- Layer 2: This layer is the hidden layer and it contains 4 nodes.
- Layer 3: This is the output layer and it contains exactly 1 node. The node value indicates whether or not the email is a spam. The output of the 3rd layer gives us the confidence of the email being a spam. For instance, an output value of 0.93 indicates that we are 93% sure that the email received is a spam email. In order to create a proper classifier, we will eventually threshold this output. For instance, if the output is more than 0.5, we will label the email as spam. If it is less than or equal to 0.5, we will consider it as no-spam.

Below is the code for the same. We will study each part of the code.

# 1. Sklearn Imports

from sklearn import preprocessing

import numpy as np

# 2. Derivative of Sigmoid

def derivative(x):

return x * (1.0 — x)

# 3. Sigmoid

def sigmoid(x):

return 1.0 / (1.0 + np.exp(-x))

# 4. Initialization

X = []

Y = []

# 5. Read the training data

with open(‘Train.csv’) as f:

for line in f:

curr = line.split(‘,’)

new_curr = [1]

for item in curr[:len(curr) — 1]:

new_curr.append(float(item))

X.append(new_curr)

Y.append([float(curr[-1])])

X = np.array(X)

# 6. Feature Scaling

X = preprocessing.scale(X)

Y = np.array(Y)

# 7. Training on first 2,500 emails out of 3,000 emails

X_train = X[0:2500]

Y_train = Y[0:2500]

# 8. The rest 500 emails will serve as testing data

X_test = X[2500:]

y_test = Y[2500:]

X = X_train

y = Y_train

# 9. dim1 indicates the size of first layer (57)

dim1 = len(X_train[0])

dim2 = 4

# 10. Initialization of the weight matrices

np.random.seed(1)

weight0 = 2 * np.random.random((dim1, dim2)) — 1

weight1 = 2 * np.random.random((dim2, 1)) — 1

# 11. Executing for 25,000 iterations

for j in xrange(25000):

layer_0 = X_train

# 12. Get the output of layer 2

layer_1 = sigmoid(np.dot(layer_0,weight0))

# 13. Get the output of layer 3

layer_2 = sigmoid(np.dot(layer_1,weight1))

# 14. Calculate the error

layer_2_error = Y_train — layer_2

# 15. Backpropagation on the weights

layer_2_delta = layer_2_error * derivative(layer_2)

layer_1_error = layer_2_delta.dot(weight1.T)

layer_1_delta = layer_1_error * derivative(layer_1)

weight1 += layer_1.T.dot(layer_2_delta)

weight0 += layer_0.T.dot(layer_1_delta)

# 16. Evaluation on the test dataset

layer_0 = X_test

layer_1 = sigmoid(np.dot(layer_0,weight0))

layer_2 = sigmoid(np.dot(layer_1,weight1))

correct = 0

# 17. Thresholding on 0.5

for i in xrange(len(layer_2)):

if(layer_2[i][0] > 0.5):

layer_2[i][0] = 1

else:

layer_2[i][0] = 0

if(layer_2[i][0] == y_test[i][0]):

correct += 1

# 18. Output

print “total = “, len(layer_2)

print “correct = “, correct

print “accuracy = “, correct * 100.0 / len(layer_2)

Let us understand the code step-by-step. We have added numbered comments in the code above. Description of each of the numbered comment is mentioned below:

1) We have used the sklearn library in Python, which is an excellent library for writing Machine Learning algorithms. Other good libraries are TensorFlow and Keras. Another library that we’ve used is numpy. Numpy is an excellent Python library for generic number processing. It is widely used by data scientists owing to great flexibility in data processing.

# 1. Sklearn Imports

from sklearn import preprocessing

import numpy as np

2) We’ve used the formula that if y = sigmoid(x), then dy/dx = y * (1 – y). This can be derived using the usual rules of differentiation.

# 2. Derivative of Sigmoid

def derivative(x):

return x * (1.0 — x)

3) Sigmoid function sigmoid(x) is defined as: sigmoid(x) = 1 / (1 + e^(-x))

# 3. Sigmoid

def sigmoid(x):

return 1.0 / (1.0 + np.exp(-x))

4) We have initialized the vectors X and Y to an empty array.

# 4. Initialization

X = []

Y = []

5) In this part of code, we are simply reading the training data from the file. Note that the file name is ‘Train.csv’.

# 5. Read the training data

with open(‘Train.csv’) as f:

for line in f:

curr = line.split(‘,’)

new_curr = [1]

for item in curr[:len(curr) — 1]:

new_curr.append(float(item))

X.append(new_curr)

Y.append([float(curr[-1])])

X = np.array(X)

6) Once the features have been read from the file, it is important to scale the feature vectors. This is done in order to ensure that the features are similar to each other in the magnitude. This helps in avoiding the dominance of a single feature.

# 6. Feature Scaling

X = preprocessing.scale(X)

Y = np.array(Y)

7) We have a total of 3,000 email data. We will train the neural network on 2,500 data points out of these 3,000 data points.

# 7. Training on first 2,500 emails out of 3,000 emails

X_train = X[0:2500]

Y_train = Y[0:2500]

8) We will test the accuracy of the neural network on the remaining 500 emails. This will ensure that we don’t test on the training data itself.

# 8. The rest 500 emails will serve as testing data

X_test = X[2500:]

y_test = Y[2500:]

X = X_train

y = Y_train

9) The size of our input vector is 57. That’s what dim1 indicates.

# 9. dim1 indicates the size of first layer (57)

dim1 = len(X_train[0])

dim2 = 4

10) In a neural network, it is important to initialize the weights with random values. The function np.random.random generates values in the range 0 to 1 and so, 2 * np.random.random – 1 will generate values in the range -1 to +1.

# 10. Initialization of the weight matrices

np.random.seed(1)

weight0 = 2 * np.random.random((dim1, dim2)) — 1

weight1 = 2 * np.random.random((dim2, 1)) — 1

11) We want to run the algorithm for 25,000 iterations. We can choose to run it for more or for less number of iterations depending on the accuracy we want and on the compute power that is available.

# 11. Executing for 25,000 iterations

for j in xrange(25000):

layer_0 = X_train

12) The first step of neural network training is to get the output on the training data for the current value of the weights.

# 12. Get the output of layer 2

layer_1 = sigmoid(np.dot(layer_0,weight0))

13) This is similar to comment 12 where we are trying to get the final value of the output of the neural network.

# 13. Get the output of layer 3

layer_2 = sigmoid(np.dot(layer_1,weight1))

14) Now, for each of the data point in the training data, we calculate the error that the algorithm did in labeling it. For instance, if an email was a spam but the algorithm got a score of 0.22, then the error will be 1.0 – 0.22 = 0.78.

# 14. Calculate the error

layer_2_error = Y_train — layer_2

15) Based on the error calculated above, backpropagation algorithm is executed and the weights are corrected.

# 15. Backpropagation on the weights

layer_2_delta = layer_2_error * derivative(layer_2)

layer_1_error = layer_2_delta.dot(weight1.T)

layer_1_delta = layer_1_error * derivative(layer_1)

weight1 += layer_1.T.dot(layer_2_delta)

weight0 += layer_0.T.dot(layer_1_delta)

16) Now finally, we get the output on the new, unseen data.

# 16. Evaluation on the test dataset

layer_0 = X_test

layer_1 = sigmoid(np.dot(layer_0,weight0))

layer_2 = sigmoid(np.dot(layer_1,weight1))

correct = 0

17) As discussed above, if the score is > 0.5, we label the email as spam; otherwise, we label it as no-spam.

# 17. Thresholding on 0.5

for i in xrange(len(layer_2)):

if(layer_2[i][0] > 0.5):

layer_2[i][0] = 1

else:

layer_2[i][0] = 0

if(layer_2[i][0] == y_test[i][0]):

correct += 1

18) We print the final output.

# 18. Output

print “total = “, len(layer_2)

print “correct = “, correct

print “accuracy = “, correct * 100.0 / len(layer_2)

This simple algorithm when executed gives an accuracy of about 90%. The exact accuracy varies given that we have used random initial weights which change every time the algorithm is executed.

Recent advancements in the space of capsule neural networks have shown that Neural networks are undoubtedly the machine learning algorithms of the future.

## Comments are closed.