One of my favorite things to say to my students is “Adults look for solutions, not excuses.” Well, in statistics, we often use the error of a particular method as an excuse for why our model is different from the data. But there is a way to explain the error (a solution, if you will). Sometimes other variables explain some of the error. We can analyze the influence of these terms using a method called **analysis of covariance** or **ANCOVA**.

## What is ANCOVA?

There are many ways to determine the effect that an independent variable has on a dependent variable. For example, linear regression can help determine the effect that multiple predictors have on an outcome. ANOVA even helps determine whether multiple treatments are truly different. Although each of those has benefits, they also have a couple of limitations.

Linear regression is best when you have continuous variables like age and income. ANOVA is great if you have categorical variables like level of education and its effect on income. But what if you wanted to know the effect of age *and* level of education on income? That’s is where ANCOVA really shines.

Analysis of covariance is a statistical method that determines how much an independent variable (or treatments levels) explains an outcome and how much **covariates** explain the error. A covariate is a variable that you think has an effect on the outcome but is not responsible for the outcome like the independent variable.

### The Covariate

For example, let’s say that I want to know the effect that level of education (like high school, bachelors, master, and doctorate) has on income. With four levels of the independent variable, it seems ripe for ANOVA. But age may also make a difference since people tend to earn more money as they get older. Age would be an example of a *covariate*. I think that it has an effect, but is not responsible for income.

The role of the covariate is to help explain away some of the error in our analysis. In essence, the original analysis is to determine if the level of education is responsible for differences in income. According to general statistics, there will be some error in my measurements within different educational groups.

Including age as a covariate, however, explains some of the error. It doesn’t explain income, but it explains why my measures may be a little off. This way, I am explaining some of the error in the overall method. Since it may affect both education and income, I include it as a covariate to explain some more of the variance on the outcome without messing with the predictor.

Of course, this assumes that age is independent of income. What that means is that one’s age does not explain income in first place. This means that the variance in age has a low correlation with the variance in income. There may be some, but it should not be a lot (i.e., less than 0.3)

## How Do I Interpret ANCOVA

Ok, you’re reading a research article and you stumble across this little gem…

*One-way ANCOVA determined that there is a significant effect of levels of education on income, F(3, 26) = 4.96, p < 0.05, after controlling for age.*

There is a lot of valuable information in this one little sentence.

The first part, *effect of level of education on income*, states the independent variable (education) and the dependent variable (income). This statement comes from the hypothesis that if different levels of education have an effect, then income will be different.

The second part contains the actual test statistic, *F(3, 26) = 4.96, p < 0.05 *. The *F* means that we are using the F-ratio as our test statistic. The 3 refers to the degrees of freedom in our independent variable, which is the level of the independent variable minus 1. The 26 refers to the degrees of freedom in each of our groups, which is the number of participants minus 1. The 4.96 is the actual value of the ratio and the *p < 0.05* tells us the level of significance.

The third part is significant. This part tells us the covariate. Furthermore, it states that once we control for it, then the effect of education level is still significant. This means that age explains some of the error even though it is a continuous variable.

## Take Away

ANCOVA is a method that allows you to take into account that some error in your analysis is measurable. Not only that, that source of error has an effect on the dependent variable. This is beneficial because it allows you more control over an experimental or nonexperimental study. It has advantages over other linear regression models because it can incorporate categorical and continuous variables to determine an overall effect. I hope to see any questions you may have. Happy statistics!

## Comments are closed.