So you have data, do you? That’s awesome because anyone that loves statistics loves data! And data begs to be analyzed. Most of the time, you should start with a graph and some type of linear regression. But once you have the equation what do you do? That’s where simple regression analysis comes in.

## Key Parts of Simple Regression Analysis

Before we delve too deep into what the analysis looks like, let’s review what makes a regression. **Simple regression analysis** refers to the interpretation and use of the regression equation. Recall that the regression equation looks like

There is not a lot there, but it is a lot to take in.

Y_{i} represents the **dependent variable** in our equation. This is the effect or outcome that we are interested in. X_{i} represents the **independent variable**. This is the variable that we think predicts the outcome.

The other things are a little more… interesting. The b_{1} represents the actual relationship between the independent and dependent variable. Now, the b_{0} is more of a theoretical term instead of a practical one. It technically means that when the independent variable is equal to zero, then the dependent variable is equal to b_{0}. Sometimes this has practical meaning, but not often in statistics.

The last part is ε_{i}. This is the error term or the range of wrongness associated with your equation. However, the error term is usually combined with b_{0} to make the equation a little easier to use.

## What Do All Mean?

In the image above, you can see a lot of dots and a line. Each one of those dots represents a data point with an independent and dependent value. Using a fairly involved equation…

…we can come up with a line that represents the data. This line is not a perfect fit for the data. If it were, then all the data points would be on the line. However, it is a good prediction. After all, that is what the regression analysis is for: to predict a dependent variable based on the independent variable.

But how good will that prediction be?

## Beware of Correlation Confusion

When you calculate the regression equation for a data set, you should also calculate the **correlation coefficient**. This is a value between -1 and +1 that represents the strength of the regression equation’s ability to predict an outcome. It is often represented by *r*. The closer the correlation coefficient is to 1 (either negative or positive), the stronger the relationship, with 1 being a perfect prediction. The formula is

The correlation coefficient is different from the **coefficient of determination**, *r*^{2}. The coefficient of determination is more explanatory. It tells you how much of the variability in the outcome is due to the variability in the predictor.

In essence, a high coefficient of determination means that most of the variance of the model is explained by the independent and dependent variables. On the other hand, a low coefficient of determination means that there is a lot of variance that the model doesn’t explain.

But perhaps the most important about correlation in linear regression is…

### Correlation is not Causation!

It is important to note that just because there is a pattern between the independent and dependent variable does not mean that the independent variable *causes* the dependent variable. Regression analysis only looks for the pattern between two variables. It is not designed for nor good for determining if one thing causes another.

Performing a regression analysis involves looking at the components of the regression equation in terms of your particular situation. For example, if you want to know the relationship between the number of romance scenes in a movie and box office sales, then the number of romance scenes is the independent variable and the box office sales is the dependent variable. The b_{1} tells you the actual relationship between the number of romance scenes and box office sales.

Ultimately, simple regression analysis merely involves understanding how the components work together to give you the ability to predict an outcome. Bear in mind that it is not perfect and doesn’t mean causation, but is a useful tool for forecasting changes and outcomes. Happy statistics!

## Comments are closed.