You’ve heard of mean, median, and mode, and you want to know what the deal is. What’s the difference? What are they for, and how do you find the mean, the median, and the mode of a set of numbers?
One of the purposes of statistics is to describe a whole group of data with a single number. Often, we want to give a “typical” value, also called an “average.” That way, we can give a number that represents what’s going on with the whole group.
For example, maybe you read that the average salary for a registered nurse is $72,000 per year. Yes, registered nurses can make more or less than that – but now we have a single number to represent the whole group. This can be useful when you’re trying to compare two different groups. Instead of having to look at all the individual salaries, you get a single average salary for nurses, and you can compare that to the average salary of physician assistants, or dentists, or whatever other group you like.
What many people don’t know is that there are different statistics you can calculate and still call it an “average.” The most common types of averages – which are sometimes referred to as measures of center – are the mean, median, and mode. Each has its own advantages and disadvantages, its own method of calculating, and its own interpretation.
Let’s take a closer look at these three statistics!
1. The Mean
This one is the big kahuna. The mean – also called the “arithmetic mean” – is the one people usually are talking about when they refer to an “average.” To find the mean of a set of numbers, add up all the numbers and then divide by how many numbers there are. For example, I might want to know the average (mean) number of puppies in a dog litter. I get some data, maybe from a local dog breeder, and find out that
- The last five litters they know of had 4, 6, 5, 6, and 7 puppies, respectively.
- Adding up all those puppies, we see a total of 28 puppies in the five litters.
- The mean number of puppies per litter would be the 28 puppies divided by the 5 litters: 28 ÷ 5 = 5.3.
- There was a mean of 5.3 puppies per litter.
A good way to think about what this number represents is to imagine putting all the puppies in a big (adorable!) pile, and then sorting them out into five equal-size stacks (five, because there were five litters – that is, five data values). The mean represents how big each litter would be if the puppies were divided equally between the litters. In general, the mean is how big each data value would be if the sum were split evenly between the data values.
One nice thing about the mean is that it takes all the data values into account – every data value contributes to the total sum, so if you change any of the numbers in the data set, you change the mean, too.
That can also be a disadvantage, because the mean is sensitive to outliers. If one of the litters had an unusually small or large number of puppies, like 101 dalmatian puppies, the mean would change drastically. Changing the 7 puppies to 101 means our new total is 122, so the mean is 122 ÷ 5 = 24.4. The mean number of puppies per litter is now 24.4. But that number doesn’t really represent a “typical” litter – it’s much bigger than 80% of the numbers in the group! So, it is misleading to report the mean as if it represents the overall data set.
Think of what this would look like with salaries – if a small number of registered nurses get paid a lot more than everyone else, the mean salary would be dragged higher, and it wouldn’t really represent the whole group any more. Instead, when there are outliers in a set of data, it’s more responsible to use a different measure of center: the median.
2. The Median
The median is the number in the very middle of the group, when the numbers are arranged from smallest to largest (that is, in numerical order). With the litters of puppies, putting the numbers in numerical order would give us 4, 5, 6, 6, and 7. Since there are five numbers, it’s easy to see where the center is – there’s a 6 in the middle, so the median is 6.
When the data set has an even number of values, you take the two values that are in the middle and average them together. (That’s the “mean” style of average as described above!)
For example, if we had another litter of 3 puppies, our numbers would be 3, 4, 5, 6, 6, 7. The two numbers in the middle are 5 and 6, so the median is (5+6)/2 = 5.5.
What about outliers? Notice that if we replace the litter of 7 puppies in our original list with a litter of 101 puppies, the median doesn’t change. The number in the middle of the list (4, 5, 6, 6, 101) is still 6. This is what we mean when we say that the median is “resistant to outliers.” It’s why the median is a better choice to represent a group of data when the data contains values that are extremely large or small when compared to the rest of the group.
As for the interpretation of the median, think of it this way. The median is the number that divides the group of data into two equal-size groups. For example, suppose you read that the median salary of dentists is $154,000 per year. That means that half of all dentists earn less than that, and half earn more.
3. The Mode
There’s one other statistic that is commonly mentioned alongside the mean and the median. The mode is the most frequently-occuring number in a data set. The easiest way to find the mode with a small data set is to put the numbers in numerical order, just like you do for the mean. Then scan for any repeated numbers, and see which number is repeated the most.
In our example with the puppies, the litters had 4, 5, 6, 6, and 7 puppies – only one number is repeated, so the mode is 6.
It’s possible for a data set to have more than one mode, if there is a tie for which number is the most frequent. For example, take a look at this data set:
52, 59, 61, 62, 62, 65, 65, 65, 68, 71, 71, 71, 74, 77, 78, 78, 80, 120
With the data in numerical order, it’s easier to see when there are repeats. Here we can see that 65 is repeated three times, as is the number 71. No other number is repeated more times than that. So this data set has two modes: 65 and 71.
It’s also possible for a data set to have no mode, if the numbers all appear with the same frequency.
The mode is useful in certain situations: when you need to know what’s the most common. For example, if you want to know which shoe size to be sure to keep in stock in your shoe store, the mode would tell you the most popular shoe size. It wouldn’t do you any good to average the shoe sizes together, but the mode could be useful.
The word “average” can mean different things, but the three most common types of averages are the mean, median, and mode.
- The mean is the traditional “average” – add numbers and divide by how many you have. It takes all the numbers in the data set into account, but isn’t great to use if the data set has extreme values (outliers).
- The median is the middle number when the data set is arranged in numerical order. It’s more representative when the data set has outliers.
- The mode is the most frequently occuring number in the data set. It’s useful when you need to know which number is the most common.
To learn more about mean, median, and mode, check out our statistics video lessons.