Regression is a statistical technique that is used to model the relationship of a dependent variable with respect to one or more independent variables. Regression is widely used in several statistical analysis problems and it is also one of the most important tools in Machine Learning.

## Regression Model Example

Let us understand a simple example of regression – predicting the price of a house in the United States. We call the price of the house (in USD) as Y. Clearly, Y depends on the following variables:

- Size of the house in square feet (X
_{1}) - Number of rooms in the house (X
_{2}) - City where the house is located (X
_{3}) - Age of the house (X
_{4})

…and many others

In such a case, the regression analysis problem tries to answer the following question – given the above variables, what would be the predicted price of a house in USD?

Clearly, as the size of the house increases, price increases. Same with the number of rooms. However, with an increase in age, the price decreases. A house in San Francisco would be far more expensive than a house in, say, Austin. Regression tries to model the underlying correlation between these variables to develop a framework to predict the price of a house.

## Types of Regression Models

There are two commonly used variations of regression:

**Linear Regression:**In linear regression, the variable Y is known to be a linear function of the variables X_{1}, X_{2}, X_{3}… linear regression is a widely used tool in Statistics and Machine Learning. The idea is to come up with the function h(X_{1}, X_{2}, X_{3}…) = a + bX_{1}+ cX_{2}+ dX_{3}+ … such that h is as close as possible to the actual underlying price determination function. The parameters a, b, c … are obtained by minimizing the cost function of the linear regression model.**Logistic Regression:**In logistic regression, the dependent variable Y can take any value. For instance, the price of a house predicted by the linear regression algorithm can be any real number. In logistic regression, however, the output variable Y can take only two values – 0 and 1. Whereas the purpose of linear regression is to predict a value, logistic regression tries to answer a “yes or no” question. For instance, is a given email spam (denoted as 1) or not spam (denoted as 0)?

Regression is quite a powerful tool and forms the building block of several complex machine learning algorithms. All those who aim to become seasoned data scientists should aim to understand both linear and logistic regression extremely well.

## Comments are closed.