When you use a hypothesis test to test a hypothesis, you are making a judgment on whether or not a hypothesis is true. Well, in the world of statistics, hypothesis tests aren’t perfect—sometimes your conclusion is incorrect! In this article, we’ll look at one of the specific ways your conclusion might be wrong, called a **Type II error** (or Type 2 error).

If you’d like to learn more about hypothesis testing and other topics from statistics, our statistics video lessons can help you out!

## Error Type Definitions

There are two possibilities in the real world: either the null hypothesis is right, or it’s wrong. We don’t know which one is the case—that’s why we are doing a hypothesis test in the first place. And sure, the hypothesis test will give you the correct result most of the time… but when the result is wrong, it’s important to know *how* it can be wrong and what that means.

- If the real situation is
**the null hypothesis is right**, then either your test works (you correctly fail to reject the null hypothesis) or your test doesn’t work (you incorrectly reject the null hypothesis). This particular type of mistake is called a “Type I error.” - If the real situation is the null hypothesis is wrong, then either your test works (you correctly reject the null hypothesis) or your test doesn’t work (you incorrectly fail to reject the null). This is called a “Type II error”.

Let’s take a closer look at these Type II errors: how they happen, what they mean, and what can be done about it.

## Type II Error Example

Let’s say we are testing a new shampoo to see if it makes hair grow faster. We know that the average rate of hair growth is 0.5 inches per month. We give our shampoo to the test subjects and measure their hair growth. Maybe the average hair growth among our test subjects is 0.6 inches per month. That’s a little bit higher than the average, but it’s close enough that we have to ask—is this difference due to random chance? Or is it something we can attribute to our new shampoo?

The null hypothesis, in this case, is that the shampoo *does not* help increase hair growth—that the hair growth rate among all people who use our shampoo would be no different than the hair growth rate among all people. (The null hypothesis can usually be boiled down to “no change” or “no difference.”)

Now, let’s suppose that the real truth is that our shampoo **actually does make hair grow faster**, because we are shampoo-making geniuses. We don’t *know* that it works, but we run a hypothesis test using our collected data to try to find out if it works or not.

If the P-value from our test is small enough, we will reject the null hypothesis. That’s great! We’ve correctly rejected the null and concluded that our shampoo makes hair grow faster. In that case:

- We market the shampoo to a big-name company, then
- A celebrity Retweets our research, after which
- Tons of orders roll in, and so
- We make millions of dollars and the world has faster growing hair. Everyone wins!

On the other hand, it might be the case that the P-value isn’t small enough to reject the null hypothesis. If the reality is that the shampoo actually does make hair grow faster, and we miss it by failing to reject the null hypothesis, then we have made a **Type II error**.

What does this mean for us? Well, we might think (incorrectly) that our shampoo doesn’t work the way we want it to. We might throw away that million-dollar formula and go back to the drawing board.

Clearly Type II errors can have serious consequences!

## Why Do Type II Errors Happen?

A few factors can contribute to a Type II error. They are more likely **when the actual change in the population parameter is small**—for example, if the shampoo increases the rate of hair growth, but only by a small amount. A small change is harder to spot than a dramatic difference, and can more easily be missed.

In our example, the change in hair growth rate was small enough to attribute it to random chance.

Type II errors are also more likely with a **small sample size**. If the number of subjects in the experiment is not big enough, a real change can still lead to a P-value that’s too large to reject the null, which is a Type II error.

It’s also possible that we just got unlucky—due to random variability in our subjects, a real change that should have been apparent was not there, or not big enough to distinguish it from random variability.

**Can Type II Errors Be Avoided?**

Some techniques can help avoid either Type I or Type II errors: repeat your experiment several times, or use a larger group of subjects. These improvements are often limited by practicality. How many rounds of testing can we afford? How many samples of shampoo can we ship out?

Another way to specifically avoid Type II errors is to increase the threshold at which you reject the null hypothesis (called the **alpha level**, or α). A typical choice for alpha is 0.05—so any P-value lower than 0.05 would lead to rejecting the null, and any P-value higher than 0.05 would lead to not rejecting the null.

Raising the alpha level to 0.10 would mean you would reject the null more often, so the chances of a Type II error are reduced.

However, **there is a trade-off** with these two types of errors. If you increase the alpha level, you are going to reject the null hypothesis more often. That’s great if the null hypothesis is actually wrong, but sometimes it’s right. If it’s right and you *incorrectly* reject it, then you’ve committed a Type I error.

The chance of a Type I error is equal to the alpha level. If you increase the alpha threshold so that the chance of Type II error is reduced, then you also are increasing the chance of a Type I error.

So, in practice, you have to decide which error type is more dangerous, and try to avoid that one.

- If the consequences of a Type II error are worse than a Type I error, you might decide alpha should be a little higher, like 0.10.
- If the consequences of a Type I error are worse, set alpha lower, maybe 0.01.
- If the consequences are about the same either way, choose alpha somewhere in the middle, maybe 0.05.

For example, with the hair-growing shampoo, we definitely don’t want to miss out on millions of dollars, so we don’t like Type II errors… but a Type I error is bad too. If our product doesn’t work but we claim it does, we could be in trouble with our customers when they don’t see results, and maybe in trouble with the government for false advertising!

## Summary

A Type II error is sometimes referred to as a “false negative.” It’s what we call it when the hypothesis test does *not* reject the null hypothesis, even though the null should have been rejected.

It means we missed finding a significant change somewhere.

You can reduce the chance of a Type II error, but be careful—raising the alpha level will reduce the chance of a Type II error *and* increase the chance of a Type I error.

## Comments are closed.