In statistics, you often want to compare several groups to each other to see if there are differences. These comparisons often use **variance analysis**. Statistical comparisons can often involve numerous groups and variables, but luckily there are several test designed specifically for analyzing the variance between these groups.

## Variance Analysis Basics

In its most basic form, variance analysis involves comparing what is expected to what is actually observed. Recall that variance is the difference between the observed values and the model values. In this case of *univariate data*, the models are the measures of central tendency. In the case of simple correlations, or bivariate data, this would mean the difference between the line of best fit and the actual values measured.

However, in the case of *multivariate data*, which is three of more variables, the challenge increase. This is because you are comparing an empirical model to a theoretical model. The easiest way to do this is to compare groups to an outcome. But the number of groups and variables determines the best method to use. Let’s explore some.

## Comparing Two Groups

The simplest set up is to compare two groups to a single outcome. To do this, the best method is a t-test. A t-test compares the same outcome for two groups. The benefit of the test is that it can be set up two different ways.

A *dependent sample* t-test compares the same outcome for two groups with the same participants. It is most useful in a before-and-after scenario. For example, say you want to compare student ACT scores before and after a special study session. You would compare the group’s scores before the session to the same group’s scores after the session. The t-test analyzed the variance before and after the study session to see if they are significantly different.

An *independent sample* t-test compares two groups based on the same outcome. This type of setup is most common in experimental studies. Like a dependent sample t-test, the test compares the variance in one group to the variance in the other. For example, let’s say that you want to determine if a new drug helps headaches. One group (the control) would receive a placebo while the other group (the treatment) would receive the actual medication. The t-test compares the outcome of pain relief of both groups to determine if there is a significant difference in pain relief. The benefit of an independent sample t-test, you don’t need to have the same number of participants in both groups or the same participants in both groups.

## Comparing Multiple Groups

It is also possible to compare three or more groups. This requires a little more thought in setting up your research, but is incredibly useful. For example, let’s say that you want to compare which dosage of pain reliever helps headaches the most. Now, you would want to compare three groups, the control and two treatment groups. For the purpose of this example, let’s say that you want to determine if 400 mg of ibuprofen is better than 200 mg, and that 400 mg is better than 200 mg, and that either of them is better than none.

It is tempting to conduct variance analysis using a series of t-tests, but this is not best. Each t-test has error associated with it, so doing multiple t-tests only increases the amount of error. Instead, we use the special technique, **analysis of variance** or ANOVA.

### Analysis of Variance

ANOVA is a special method for comparing three or more groups. Instead of analyzing the variance of two groups, it analyzes the variance *and* the error that occurs in a group and between groups. It compares the groups in the context of a single outcome. In our example, the outcome is pain relief.

For an effective experimental setup, the three groups would be the control (which receives a placebo), the 200 mg group, and the 400 mg group. We would measure the pain relief for each group and then consider if not only the variance *in *the group is different enough but also if the variance *between* the groups is enough.

You analyze the variance using the *F-ratio*. The F-ratio is a ratio of the error variance between the groups compared to the error variance within the group. Like the t-test, the F-ratio compares actual values to a distribution values that helps to determine significance. In this way, one can see if there is a significant difference between the groups.

A word of caution: the F-test only reveals if there is a significant difference. It does not determine *which *group is different nor *how* it is different. For that, you need *post hoc* tests.

### Analysis of Covariance

Naturally, there are some things that may affect how effective ibuprofen is from person to person. For example, weight influences dosage. So, someone that is 300-lbs may not get the same amount of pain relief as someone who is 100-lbs. An instance of one or more variables that can affect the outcome, but are *not* the treatment, are called **analysis of covariance or ANCOVA.**

ANCOVA accounts for a treatment effect and other mitigating variables, like weight or age in an experimental trial. In this case, the error variance of the covariates is in the analysis. ANCOVA is generally used when looking for an effect rather than looking for a model to predict outcomes. The benefit of ANCOVA is that you get a much clearer picture of the effect that predictors have when used with covariates.

## Comparing Multiple Outcomes

Now, what if we want to analyze the variance of more than one outcome? Back to our ibuprofen example, let’s say that you want to analyze the effect of dosage on both pain relief *and* a particular side effect like nausea? In this case, you want to analyze the variance of two dependent variables. This is a case of **multivariate analysis of variance** or MANOVA.

The essential goal of MANOVA is to determine the effect that one or more groups have one multiple dependent variables. The real power of MANOVA lies in that it compares the variance of the independent variables to the variance *between* in the outcome variables.

Think of it this way, ibuprofen should relieve pain but can simultaneously cause nausea. The most effective dosage would be one that increases pain relief while also minimizing nausea. So, you are looking at two outcomes or dependent variables. You still want to assess the effect on dosage, but now you also consider the connection between the two outcomes as well.

## When Does It Matter?

So far, we have used a teaching example with ibuprofen and an experimental setup, but it begs the question: when will I ever use this?

On prime example of variance analysis is in business. Recall that the fundamental purpose of variance analysis is to determine the difference between a predicted model and the actual data. One business example is to explain the difference between predicted sales and actual sales. Variance analysis provides you with a way to analyze the difference while explaining what is affecting the outcome.

Another example is in cognitive behavior therapy that should change thoughts *and* actions. MANOVA compares the relationship between the two dependent variables to the relationships between multiple types of cognitive and behavioral therapy.

## The Takeaways

Variance analysis comes in all shapes and sizes, but it has one underlying goal: examine the difference between what is observed and what is expected. For univariate data, it is an examination of observed deviation from a measure of central tendency. For bivariate data, it is the difference between the model value (or the line of best fit) and the observed value. For multivariate data, it involves comparing groups and their effect on one or more outcomes.

I hope that this post helps shed some light on variance analysis. I look forward to any questions that you may have below. Happy statistics!

## Comments are closed.