# Understanding Measures of Central Tendency

I love to read. Of course, a lot of people do too; that is why there are bookstores after all. But when I go to the bookstore to buy a new book, I don’t read the entire book before I buy it. Instead, I read the synopsis and maybe a couple of reviews of the book. Measures of Central Tendency are the same kind of thing. They are a synopsis of the data so that you don’t have to look at every value. In statistics, the four measures are mean, median, mode, and range.

## The Measures

Let’s say that you are curious about the number of blue M&Ms (they’re your favorite after all) in the little snack bags. So you open 15 bags and count the blue ones and get the following data

7, 8, 8, 6, 5, 2, 4, 1, 4, 12, 5, 7, 6, 3, 6

### Mean

The first, and some say most important, measure of central tendency is the mean. The mean is a value that represents all the data in a single number. This is one way to model the data.

The mean represents the average of the data. So, you should add up all the values and then divide by the number of values in the dataset. The mean for our data set is

7 + 8 + 8 + 6 + 5 + 2 + 4 + 1 + 4 + 12 + 5 + 7 + 6 + 3 + 6 = 84

84 ÷ 15 = 5.6

This is a model of the data because it represents how many blue M&Ms that you are likely to find in a mini snack bag. But there are some caveats about the mean that we need to keep in mind.

First, this is just a model of the data. This means that the mean is a theoretical value. You probably aren’t going to find 0.6 of an M&M, but you may find a number close to 6. Second, the mean can be influenced by outliers. An outlier is an unusually high or low value, your 12 for example. Outliers shift the mean in an unrealistic way, so the mean may be a little inaccurate if there are serious outliers.

### Median

The median is a useful and descriptive term. It is the value that is in the middle of the data when the values are put in numerical order. So, for your data

1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 12

At first, it doesn’t seem like this value is particularly useful. But it is. The median tells you how the middle of the data compares to the mean. It lets you know if the data is skewed. If the median is greater than the mean, then you have what is called negative skew. If the median is less than the mean, then you have positive skew. When they are approximately the same, then you have a normal distribution.

Here is a look at how medians help us tell the distribution of data Image by Richard Beck

### Mode

The mode of the data is the value that is most common in the dataset. For example, in your data set, 6 occurs three times and more than any other measure. This is the most common value, which makes it the mode. Like the median, the mode gives you an indication of how normal the data are compared to the mean. That is to say, the mode helps you determine if your data are skewed.

It is possible to have two (or more) modes if multiple values occur often enough, you can say that the data are bimodal. The mode is a handy way to “see” the data before you analyze it.

### Range

The range is another value that gives you a better picture of the data. The range is simply the numerical distance between the highest value and the lowest value. It is a measure of how variable the data are. For your data, the range is

12 – 1 = 11

This is a pretty big range when you consider that the average is 5.6. A large range indicates a lot of variability in the data while a small range (less than the mean usually) indicates less variable, or more precise, measurements.

As with the mean, the range is easily influenced by outliers. So, we getting a look at the range, keep the outliers in mind.

## When to Use What

You should always take a good look at your data before you start any sort of inferential analysis. But it can be hard to decide which kind of measure to look at first. When it comes to measures of central tendency, then the best measure to use depends on the variable type.

If your variable is nominal, like gender, then you are limited in the measure that you can use. The best measure of central tendency to use for nominal variables is mode. Recall that nominal variables are those that do not have a specific quantitative value, so you couldn’t calculate the mean or median anyway.

If your variable is ordinal (i.e., it has an order or ranking) then the most useful measure is median. This is because the median gives you a sense of how the data falls on its particular scale. Just as a side note, some statisticians use a mean of ordinal data, like a ranking system, for analysis.

The mean is best used with interval or ratio data. This is because both of those types of variables have number lines associated with them. Of course, you have to keep in mind that the mean is influenced by outliers, so if the data are skewed, then the median would be best to use.

## Skewness

Mean, median, and mode generally measure the center of the data. But data do not always fall even around the center point. Sometimes the majority of the data hovers towards higher or lower values. Sometimes the outliers influence the measures a little too much. In these cases, the data is skewed from the center.

Let’s take a look at some skew… We can use the measures of central tendency to describe data, but we can also use them to determine skew.

In the case of negative skew, the mean is influenced by the presence of outliers at the lower end of the data’s range. This means that the mean would be less than the median and mode. It seems counterintuitive that the majority of the data is on the higher side, but it is called negative skew. One mnemonic that I use is that the long tail points to the negative side of an x-axis.

For positive skew, the case is backward. The mean is heavily influenced by the outliers at the upper part of the data’s range. In this case, the mean is greater than the mode or median. The mnemonic that I use is that the long tail is skewed toward the positive end of the x-axis.

Skewness is a property that affects further statistical analyses. Most inferential methods, like ANOVA or linear regression, assume that the data is normally distributed. If you were to use these methods on data that is too skewed, then your analysis would be incorrect

## Overall…

The measures of central tendency give a good description of the data. The real trick is knowing how each of them works and when to use them. Most especially, you should look at these measures first in order to get a sense of your data before you analyze it too deeply. Happy statistics!