I drink a lot of coffee (because it’s delicious) and I am very particular about how I make my coffee. First cream, then coffee. My wife is always surprised that I can tell if she poured the coffee or the cream first. It is a peculiar habit, but also a statistically significant one. In statistics, significance is paramount to coming up with a viable model that explains patterns in the data. **Significance level** is a level to which we are willing to accept chance as an explanation.

## Quick Refresher

Recall that, with any statistical analysis, we develop a hypothesis about patterns in the data. We then use those statistical methods to determine a model of the data that fits the hypothesis. The *null hypothesis* is that the model does not fit the data very well while the *hypothesis* is the model itself.

Rejecting the null hypothesis means accepting the hypothesis as true with a degree of confidence. That level of confidence represents the significance level of the model.

## What Does Significance Mean?

Let’s use my coffee-tasting skills as an example. I think that I can determine when coffee is poured into the cream and the other way around. So my wife sets up two coffee cups, each version, and asks me to tastes them. I have a 50% chance of determining which cup is which. Let’s say I guess correctly.

*But is that significant?*

Is a 50% chance of getting the answer right enough for me to claim I *can* taste the difference? Probably not. In practical terms, we say this is not significantly different enough. In statistical terms, we say that the significance level (probability of getting the right answer) is too high.

In this particular set up, I can say that anyone has a 50% chance of supporting my hypothesis (that I can taste the difference) and 50% chance of rejecting it. In statistics, that is not a tight enough margin of error to claim a pattern.

Let’s take my coffee tasting a step further. My wife makes six cups of coffee for me. Five of them had coffee poured into the cream and only one has cream poured into coffee. There are 20 ways that the cups can be mixed up to randomize the chances a bit.

I tasted them all and correctly pick the cup that has had the cream poured into the coffee. What are the chances of that happening?

### Significance and Probability

When I said that there are 20 ways for the cups to be arranged, that means that there is a one in twenty chance of me deciding the correct order of coffee-cream and cream-coffee. This means that the probability of me selecting the right order has a significance level of 5%.

What this means is that there is a 5% *chance *that I simply guessed correctly. Generally, in statistics, we are willing to accept that level of chance. After all, it is very low that I *chanced* upon the right order. This also means that I am 95% sure that I did *not* simply guess correctly and I can taste the difference.

So, put another way, the significance level is the amount to which we are willing to attribute our results to chance.

## Why 5%, Why Not 0%?

The significance level that most statisticians are willing to accept is 5% or a probability of 0.05. Sometimes, they select a significance level of 0.01 or 0.001. The lower level of chance you accept, the more likely you are to reject the null hypothesis correctly.

*But why not just select 0% chance?*

Well, let’s take a look at what that means. That would mean that I am *always* right and that there is no error at all. It would mean that the probability of selecting wrongly is 0.00. That is just not feasible over hundreds or thousands of trials. There is always an infinitesimally small chance always present in any experiment or study due to error.

Also, the smaller degree of chance that we are willing to accept increases the chance that we reject or retain the null hypothesis falsely. This is called Type I Error. To compensate, researchers usually accept a small amount of chance at the 5%, 1%, or 0.1% levels.

## The Takeaway

Mathematically, the significance level refers to the probability of getting that model or event by chance. Conceptually, the significance level is the degree of confidence that we have in retaining or rejecting our hypothesis of difference between the model and random chance. 5% is usually the highest significance level that statisticians and researchers are willing to accept, though it can be less. I look forward to seeing your questions below. Happy statistics!

P.S. I ran this experiment one day based on a book called *The Lady Tasting Tea* by David Salsburg. Give it a read for some great statistics history and trivia.

## Comments are closed.