# What is the Regression Equation?

First off, calm down because regression equations are super fun and informative. In statistics, the purpose of the regression equation is to come up with an equation-like model that represents the pattern or patterns present in the data. So let’s discuss what the regression equation is.

# The Variables

Essentially, we use the regression equation to predict values of a dependent variable. This dependent variable is called the outcome variable. It is the effect that we are interested in predicting, such as the amount of urea in urine.

I know it’s gross, but not everything in statistics is super clean 🙂

The independent variable for this gross data is the predictor variable. In our example, we are going to consider the osmotic pressure of bladder. This means that I think the osmotic pressure of the bladder has some effect on the amount of urea in urine.

Let’s begin with a simple scatter plot of the data.

Just from the graphic, you should notice a slight upward trend in the data. This suggests that the osmotic pressure of the bladder does have a relationship with the amount of urea in urine.

Unfortunately, the graph itself is not a statistical model that we can use to describe the data. We can’t offer a clear and reliable prediction of the osmotic pressure needed to produce a certain amount of urea.

## The Regression Equation

A regression equation is a statistical model that determined the specific relationship between the predictor variable and the outcome variable. A model regression equation allows you to predict the outcome with a relatively small amount of error.

In this model, Yi represents an outcome variable and Xi represents its corresponding predictor variable. The equation also contains numerical relationships between the predictor and the outcome.

The term b0 represents an intercept for the model if the predictor be a zero value. You could consider it something like a baseline or control point. The term bi represents the numerical relationship between the predictor variable and the outcome for the ith term. These are called regression coefficients.

Using the method of least squares, you can determine the line of best fit for a series of data. I have taken the liberty of finding the line of best fit for our urea and osmotic pressure example in the graphic below.

Now you have a mathematical model! Since there is a line, we can create an equation to describe and predict data points.

The mathematical model is

In this case…

The benefit of a statistical model is that you are now able to predict your outcome variable with greater precision. For example, how many milligrams of urea would be produced at an osmotic pressure of 1100 mmHg? This value is not on the graph, so plugging values into the model will predict the answer.

So, we know that about 562.8 mg of urea should be present if the bladder’s osmotic pressure is 1100 mgHg. However, you must always consider how good the model is using tests for goodness of fit, which are similar to the chi-square test. These tests indicate how well your model works and how trustworthy it is.

Regression models are useful for predicting outcomes based on measurable variables. It is even possible to have more than one predictor variable with linear regression. However, multiple linear regression is a topic for another post.