No one can be right all the time, and that’s just as true for a statistical hypothesis test as it is for people! Sometimes the conclusion you draw from a hypothesis test is incorrect, and you won’t even know it. In this article, we’ll take a closer look at one way the conclusion can be wrong, called a Type I error (or Type 1 error).
Type I Error Definition
The null hypothesis is usually a statement of “no change” or “no difference between groups.” A hypothesis test is done to decide if there is enough evidence of change to reject the null hypothesis.
If we incorrectly think we have significant evidence—strong enough evidence to reject the null—we will conclude that there actually is a change, or a difference between the groups. Incorrectly rejecting the null hypothesis is called a Type I error.
These types of errors are also called false positives, because we think we found significant results, even though the truth of the matter is that there is nothing there to find.
(A Type II error or false negative is the opposite: we incorrectly think we don’t have significant evidence, leading us to accept the null hypothesis even though it isn’t true!)
So far, so good. Let’s look at an example!
Type I Error Example
Suppose you’re on your school’s basketball team, and you want to know if wearing your lucky socks helps you win games. Being statistically minded, you decide to alternate wearing your lucky socks with not wearing your lucky socks. Of course, you keep track of how often your team wins in each case.
Let’s say you win 55% of the games when you aren’t wearing the socks, and 60% of the games when you are wearing the socks. It doesn’t actually matter what the percentages are. Even if the lucky socks made no difference at all, it’s not likely you’d have exactly the same win percentage. So there is likely a difference in the win percentages between the two samples.
The real question is this: is the difference in win percentages due to sampling variability (random chance), or is it representative of some actual difference between the two populations? In other words, does wearing the lucky socks really make a difference?
The hypothesis test is going to try to answer that question: if the P-value is small enough, then we will conclude that there is a statistically significant difference between the populations, and decide that yes, wearing the lucky socks does make a difference.
If the truth is that the lucky socks have no effect, either positive or negative, on the outcomes of your basketball games, then our conclusion that they do is wrong—a Type I error. We’ve concluded that there is a difference, even when none really exists.
How did it happen? Why was the win percentage higher with the socks even though the truth was the socks made no difference? That was just due to random chance.
Just like a fair coin doesn’t have to land on heads exactly 50% of the time in a given set of tosses, the basketball team just happened to win a few more games when you were wearing the lucky socks. This variation is called sampling variability, and it can lead us into making a Type I error.
Consequences of Type I Errors
In our example, the consequences of a Type I error are not very severe. You might end up wearing the same pair of socks for every single basketball game, in the mistaken belief that they improve your chances of winning.
And maybe some of the other players tease you about it. No big deal! After all, if you’re wrong, there are no negative consequences.
On the other hand, Type I errors can have more serious consequences. Imagine you’re running a drug research trial, and you find evidence that a new, expensive treatment is more effective than the old one. If your evidence is wrong, you could be making a very expensive Type I error.
But it could be even worse…
If your new treatment has some negative side effects, but doctors start using it because they think the benefits outweigh those side effects, then you could be making a very harmful Type I error.
How to Avoid Type I Errors
Hypothesis tests are always better with more data, and it turns out that the chance of a Type I error can be reduced by increasing the sample size of the experiment. It needs to be large enough to ensure that a practical difference can be detected.
If you really want to avoid Type I errors, good news. You can control the likelihood of a Type I error by changing the level of significance (α, or “alpha”). The probability of a Type I error is equal to α, so if you want to avoid them, lower your significance level—maybe from 5% down to 1%.
(Why can’t we reduce the chance of a Type I error to 0%? If the significance level is 0%, then no P-value will ever be small enough, since P-values can’t be zero. Then you will never reject the null hypothesis, even when it’s wrong, making your hypothesis tests pretty useless.)
So, you can lower α to reduce the chance of a Type I error. But the bad news is, there is a price for this improvement. Changing the significance level will have the opposite effect on the chance of a Type II error!
- Lowering α makes Type I errors less likely, and Type II errors more likely.
- Raising α makes Type I errors more likely, and Type II errors less likely.
To choose an appropriate significance level, first consider the consequences of both types of errors. If the consequences of both are equally bad, then a significance level of 5% is a balance between the two. Raise or lower the significance level as needed for your situation.
A Type I error is a particular kind of wrong conclusion in a hypothesis test—one where we have mistakenly rejected the null hypothesis, even when it is actually true (we just don’t know it).
It means we found a significant result, even though there is no real result to find.
The probability of a Type I error is equal to the significance level of the hypothesis test, so you can control how likely it is—but lowering the chance of a Type I error will increase the chance of a Type II error.