Statistics is all about coming up with models to explain what is going on in the world. But how good are we at that? I mean, numbers are only good for so many things, right? How do we know if they are telling the right story?

Enter the famous world of **test statistics**.

The goal of a test statistic is to determine how well the model fits the data. Think of it a little like clothing. When you are in the store, the mannequin tells you how the clothes are supposed to look (the theoretical model). When you get home, you test them out and see how they actually look (the data-based model). The test-statistic tells you if the difference between them (because I definitely do *not* look like the mannequin.) is significant.

In another post, I discussed the nature of correlational and experimental research. Linear regression, multiple regression, and logistic regression are all types of linear models that correlate variables that occur simultaneously. However, *experimental* models are concerned with cause-effect models, or at least models that state a significant difference between cases.

*Test statistics* calculate whether there is a significant difference between groups. Most often, test statistics are used to see if the model that you come up with is different from the ideal model of the population. For example, do the clothes look significantly different on the mannequin than they do on you? Let’s take a look at the two most common types of test statistics: ** t-test** and

**.**

*F*-test# t-Test and Comparing Means

The *t*-test is a test statistic that compares the means of two different groups. There are a bunch of cases in which you may want to compare group performance such as test scores, clinical trials, or even how happy different types of people are in different places. Of course, different types of groups and setups call for different types of tests. The type of *t*-test that you may need depends on the type of sample that you have.

If your two groups are the same size *and* you are taking a sort of before-and-after experiment, then you will conduct what is called a **Dependent** or **Paired Sample t-test**. If the two groups are different sizes or you are comparing two separate event means, then you conduct a

**Independent Sample**.

*t*-test### Dependent or Paired Sample t-Test

I am a fairly introverted person. I’m so introverted that I have extreme anxiety in social situations that warrant a therapy dog by the name of Chloe. And she’s pretty adorable.

Now, a lot of people have therapy dogs in order to relieve anxiety. Let’s say that you measure people’s anxiety *without* their therapy dogs and *with* their therapy dogs on a scale from 1 (low) to 5 (high) to determine if therapy dogs do significantly lower anxiety for people like me. For the sake of convenience, you get the following data

At first glance, it seems that there is a clear difference between people’s level of anxiety with and without their therapy dogs. You want to jump to the conclusion that our model (they do make a difference) is different from the null hypothesis (they don’t). But wait, you want to have some statistical data to back that claim up. So you perform a *t*-test.

A *t*-test** is a form of statistical analysis that compares the measured mean to the population mean, or a baseline mean, in terms of standard deviation. Since we are dealing with the same group of people in a before-and-after kind of situation, you want to conduct a dependent t-test. You can think of the without scenario as a baseline to the with scenario. **

The traditional *t*-test equation looks like

The null hypothesis states there should be no difference between the two sample means. So that means μ_{1} – μ_{2} = 0 giving us

But what do you do with this number? Well, you will consult the mystical chart of *t* Table. Along the top of the table is the *probability* of error that you are willing to accept. In other words, what is the possibility that you are wrong? Along the side of the table are the degrees of freedom. In this case, you have 46 degrees of freedom because you have two groups with 24 participants each.

The *t* Table states that the critical value for 46 degrees of freedom and the 0.05% error is 2.013. Your calculated *t*-value is above that, which indicates that your means are significantly different. Based on my completely random, fictitious data, the lower mean of anxiety people show *with* their therapy dogs is different enough to be meaningful, otherwise known as statistically significant.

I guess Chloe is good for me, lol.

### Independent Sample t-Test

The case for independent sample tests is a little different. This style of test is best suited to experimental designs, or those designs that compare groups with different sets of participants. The benefit is that the groups do not have to be equal sizes. Let’s check another statistical example.

Let’s pretend for a moment that you (for some crazy reason) want to know if people are more anxious in statistics class than in another, let’s say English, class. So you find some willing volunteers and measure their heart rates during each class. It’s important to note that neither class will have the same participants. Your data looks a little like this

There is a difference, but is it enough of a difference? When you calculate the t-value and find it to be 1.92, compare this to the t-table at the 40 mark, notice it is below the critical value. This means that while there is a difference, it is not a *significant* difference.

Huh, I guess statistics isn’t too stressful after all.

The role of the t-test is to determine whether two groups are different from each other. Just remember that dependent t-tests are best used for groups that have the same participants, while independent t-tests are for groups with different ones.

## F-Test Statistic

*But John, what if I want to test something else? Like a model?*

That is a fantastic question!

Sometimes we want to compare a model that we have calculated to a mean. For example, let’s say that you have calculated a linear regression model. Remember that the mean is also a model that can be used to explain the data.

The *F*-Test is a way that we compare the model that we have calculated to the overall mean of the data. Similar to the *t*-test, if it is higher than a critical value then the model is better at explaining the data than the mean is.

Before we get into the nitty-gritty of the *F*-test, we need to talk about the **sum of squares**. Let’s take a look at an example of some data that already has a line of best fit on it.

The *F*-test compares what is called the **mean sum of squares** for the *residuals* of the model and and the overall mean of the data. Party fact, the residuals are the difference between the actual, or observed, data point and the predicted data point.

In the case of graph (a), you are looking at the residuals of the data points and the overall sample mean. In the case of graph (c), you are looking at the residuals of the data points and the model that you calculated from the data. But in graph (b), you are looking at the residuals of the *model* and the overall sample mean.

The sum of squares is a measure of how the residuals compare to the model or the mean, depending on which one we are working with. There are three that we are concerned with.

The *sum of squares of the residuals* (SS_{R}) is the sum of the squares of the residuals between the data points and the actual regression lines, like graph (c). They are squared to compensate for the negative values. SS_{R} is calculated by

The *sum of squares of the total* (SS_{T}) is the sum of the squares of the residuals between the data points and the mean of the sample, like graph (a). They are squared to compensate for the negative values. SS_{T} is calculated by

It is important to note that while the equations may look the same at first glance, there is an important distinction. The SS_{R} equation involves the predicted value, so the second Y has a little carrot over it (pronounced Y-hat). The SS_{T} equation involves the sample mean, so the second Y has a little bar over it (pronounced Y-bar). Don’t forget this very important distinction.

The difference between the two (SS_{R} – SS_{T}) will tell you the overall sum of squares for the model itself, like graph (b). This is what we are after in order to finally start to calculate the actual *F* value.

These sum of squares values give us a sense of how much the model varies from the observed values, which comes in handy in determining if the model is really any good for prediction. The next step in the *F*-test process is to calculate the **mean of squares** for the residuals and for the model.

To calculate the *mean of squares of the model*, or MS_{M}, you need to know the degrees of freedom for the model. Thankfully, it is pretty straightforward. The degrees of freedom for the model is the number of variables in the model! Then follow the formula MS_{M} = SS_{M} ÷ df_{model}

To calculate the *mean of squares of the residuals*, or MS_{R}, you need to know the degrees of freedom in the sample size. The degrees of freedom in the sample size is always N – 1. Then simply follow the formula MS_{R} = SS_{R} ÷ df_{residuals}

Ok, you have done a whole lot of math so far. I’m proud of you because I know that it is not super fun. But it is *super* important to know where these values come from because it helps understand how they work. Because now we are actually going to see how the *F -statistic is actually calculated!*

This calculation gives you a ratio of the model’s prediction to the regular mean of the data. Then you compare this ratio to an F-distribution table as you would the t-statistic. If the calculated value exceeds the critical value in the table, then the model is significantly different from the mean of the data, and therefore better at explaining the patterns in the data.

Test statistics are vital to determining if a model is good at explaining patterns in data. The simplest test statistic is the **t-test**, which determines if two means are significantly different. For more complex models, the **F-statistic** determines if a whole model is statistically different from the mean. Both cases are essential for telling a good model from a bad one. Happy statistics!

## Comments are closed.