In perfect statistical world, all of our data would always follow a nice straight line with no error at all. Of course, actual statistics don’t actually work that way. The data can be all over the place and follow a rhyme or reason we didn’t predict. That’s why we look for patterns in the data using regressions and test-statistics.

Of course, to use those statistics, we usually need to meet an assumption that our data is homoskedastic. This means that the variance of the error term is consistent across all measures of the model. It also means that the data is *not* heteroskedastic. There are a couple of ways to test for heteroskedasticity.

## Visual Test

The easiest way to test for heteroskedasticity is to get a good look at your data. Ideally, you generally want your data to all follow a pattern of a line, but sometimes it doesn’t. The quickest way to identify heteroskedastic data is to see the shape that the plotted data take. For example, the data below follows a general heterskedastic pattern because it is cone shaped.

You should not perform a normal type of linear regression with this data because the variance varies. You know the variance varies because the points get further from a line of best fit.

## Breusch-Pagan Test

The Breusch-Pagan test is a quick and dirty way to determine statistically whether your data is heteroskedastic. The actual math is pretty straightforward:

χ^{2} = n · *R*^{2} · k

In this case, n is the sample size; *R*^{2} is the coefficient of determination based on a possible linear regression; and k represents the number of independent variables. The degrees of freedom is based on the number of independent variables instead of the sample size. This test is interpreted like a normal chi-squared test. A significant result means that the data *is* heteroskedastic.

A word of caution, if the data is not normally distributed, then the Breusch-Pagan test may give you a false result. Check that assumption.

## White’s Test

White’s Test for Heteroscedasticity is a more robust test that tests whether all the variances are equal across your data if it is not normally distributed. The math is a little much for this post, but many statistical programs will calculate it for you.

It is interpreted the same way as a chi-square test. What I mean is that you are still testing whether the variance of the data is approximately equal to the variance of the model. If the test is significant, then the data is heteroskedastic.

It still determines whether the variance are all equal across the data; however, the test is very general and can sometimes give false negatives.

## The Takeaways

Determining the heteroskedasticity of your data is essential for determining if you can run typical regression models on your data. There are three primary ways to test for heteroskedasticity. You can check it visually for cone-shaped data, use the simple Breusch-Pagan test for normally distributed data, or you can use the White test as a general model. I hope that this post helps clarify some things. I look forward to seeing any questions that you have below. Happy statistics!

## Comments are closed.