What is perceptron, and what does it mean?

In Machine Learning, we often face situations where we need to label the data as to belonging to a particular class. A simple example is the spam classification problem where we are given a bunch of emails and we need to build an algorithm that is capable of labeling each of these emails as a spam or a non-spam (ham). Such problems are termed as classification problems.

In classification problems, we are supposed to design a classifier that is able to correctly predict the class to which each datapoint belongs. Spam classification is a special example of the classification problem where the classifier is a binary classifier. That is, there are only 2 classes — spam and non-spam. There are classifiers which may classify the data points into one of multiple classes. As an example, we are given images of various flowers and we are supposed to identify which flower is contained in the image. Clearly, there are many flowers and so, each image can be mapped to one of these many flowers. This is an example of a multi-class classification problem.

## Linear Separability of Data

Observe the datasets above. We can see that in each of the above 2 datasets, there are red points and there are blue points. However, there is one stark difference between the 2 datasets — in the first dataset, we can draw a straight line that separates the 2 classes (red and blue). This isn’t possible in the second dataset.

The datasets where the 2 classes can be separated by a simple straight line are termed as *linearly separable datasets*. Linearly separable datasets are easy to classify. This is where perceptrons come into picture.

In the above dataset example we saw 2 dimensional, linearly separable data. Such data can be separated by “some” straight line y = ax + b. However, if the data is of more than 2 dimensions, we can extend the above concept of linear separability. Let us assume that each data point X is of n dimensions. So, X = [x_{1}, x_{2} .. , x_{n}]. Also, assume that the data can have only one of the 2 labels — 0 or 1. In this case, the data will be termed as linearly separable if we have some w = [w_{1}, w_{2} .. , w_{n}] and some real number b such that:

wX + b > 0 for all X that are of class 1

and

wX + b <= 0 for all X that are of class 0.

This means that the data can be separated by the hyperplane wX + b = 0.

Let us now talk about how we can use perceptron learning to find the values of w and b.

## Introduction to Perceptron

Perceptron Learning is a supervised learning algorithm for classification of data in linearly separable datasets. Perceptron eventually creates a function f such that:

f(X) = 1 if wX + b > 0,

f(X) = 0 if wX + b <= 0

Observe here that the weight vector w and the real number b are unknowns that we need to find.

## Perceptron Learning Algorithm

We append each data point with the value ‘1’. So now, the vector X looks like: X = [x_{1}, x_{2} .. , x_{n}, 1]. The benefit that this gives us is that previously what we wrote as wX + b can now be written as WX where W = [w, b] (b appended to the vector w).

So now, to obtain the output, we simply compute WX instead of computing wX + b.

The perceptron learning algorithm is now simple. For each data point X_{I} in our training data, perform the following steps:

- Calculate the actual output: y
_{I}= WX_{I} - Update W as W’ = W + 𝛼(Y
_{I}– y_{I}) * X_{I} - Assign W = W’ and goto step 1

Here 𝛼 is termed as “Learning Rate” of the algorithm. The value of 𝛼 has to be tuned manually. Typically, we choose different values of 𝛼 as 0.001, 0.01, 0.1, 1.0, 10.0, etc.

Mathematically, it can be shown that if the data is linearly separable, the algorithm converges after a finite number of iterations. Even if the data is not completely linearly separable, the perceptron learning algorithm often converges to a decent value which gives a good estimate of the decision boundary. Take a look at the following example of perceptron learning being applied on the iris flower dataset:

# Import the required libraries/packages

from sklearn import datasets

from sklearn.linear_model import Perceptron

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Load the dataset

# Iris is a simple flower dataset

iris = datasets.load_iris()

# Fetch the data from the dataset

X = iris.data

y = iris.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

# Train the model

perceptron = Perceptron(n_iter=40, eta0=0.01)

perceptron.fit(X_train, y_train)

# Perform predictions on the model

y_pred = perceptron.predict(X_test)

# Print the accuracy of the model

print(accuracy_score(y_test, y_pred))

If you run the above code in Python, you will observe an accuracy of about 70%. For a simple 15 line code, this accuracy is pretty good.

In the above code, we have used the famous iris dataset. Iris dataset comprises of data of 150 flowers. For each flower, the following values are obtained:

- Length of sepal
- Width of sepal
- Length of petal
- Width of petal

All the values are in centimetres.

The idea is to identify the flower based on these 4 parameters. There are essentially 3 flowers:

- Setosa
- Versicolour
- Virginica

I hope that the above post helped you learn about the basics of Perceptron learning, and that you would be able to apply it to solve challenging Machine Learning problems!

## Comments are closed.