First, here are some challenging practice questions:
1) Set S has a mean of 10 and a standard deviation of 1.5. We are going to add two additional numbers to Set S. Which pair of numbers would decrease the standard deviation the most?

(A) {2, 10}
(B) {10, 18}
(C) {7, 13}
(D) {9, 11}
(E) {16, 16}
2) Set Q consists of the following five numbers: Q = {5, 8, 13, 21, 34}. Which of the following sets has the same standard deviation as Set Q?
I. {35, 38, 43, 51, 64}
II. {10, 16, 26, 42, 68}
III. {46, 59, 67, 72, 75}

(A) I only
(B) I & II
(C) I & III
(D) II & III
(E) I, II, & III
3) Consider the following sets:
L = {3, 4, 5, 5, 6, 7}
M = {2, 2, 2, 8, 8, 8}
N = {15, 15, 15, 15, 15, 15}
Rank those three sets from least standard deviation to greatest standard deviation.

(A) L, M, N
(B) M, L, N
(C) M, N, L
(D) N, L, M
(E) N, M, L
Do these three questions make your head spin? You have found a good blog article to help you! Explanations to these will appear at the end.
Spread
When we are summarizing a list of numbers, typically we want to know the center and the spread. (If we are doing an advanced analysis, we would also want to know the shape of the distribution: that can come into play in IR questions.)
The two most typical measures of center are mean and median. Center gives us an idea of where the middle of the distribution of numbers falls.
Measures of spread give us an idea of the spacing of the numbers, how much they are “spread” out from each other. A relatively crude measure of spread is the range, which really only tells us about the extreme high and the extreme low, not all the data points in the middle. A more sophisticated measure of spread is the standard deviation.
Standard Deviation
Every list of numbers has a mean. Therefore, every number on the list has a deviation from the mean: that is how far that number is from the mean.
deviation from the mean = (value) – (mean)
Technically, numbers below the mean have a negative deviation from the mean, and numbers above the mean have a positive deviation from the mean. In the list {2, 4, 6, 8, 10}, the mean = 6, so 8 has a deviation from the mean of +2, and 2 has a deviation from the mean of 4. So, parallel to this first list is a second list, the list of deviations from the mean. (It’s a good exercise to convince yourself why this second list always has a mean of zero.)
Here is the technical procedure for calculating the standard deviation. We already have List #1, original data set, and List #2, deviations from the mean for each value in List #1. Now, List #3 will be the List #2 squared — the squared deviations from the mean. This is the list we average: that average is something called the “variance.” Then, to undo the effects of squaring, we take a square root, and that final answer is the standard deviation. The OG explains this procedure in the Math Review. If you understand and remember this, great, but chances are good that you don’t need to know it in all its gory detail if you know the rough and ready facts below.
Rough and ready facts about standard deviation
1) The standard deviation gives us an estimate of the size of a typical deviation from the mean. It’s a way of “averaging” the deviations from the mean, though it is not strictly the mean of that list.
2) If every element in the data set is equal, they all equal the mean, each deviation from the mean is zero, and the standard deviation is zero. This is the lowest possible standard deviation for any set to have. (That’s an excellent GMAT shortcut to know!)
3) If you add the same number to every number on the list, or if you subtract the same number from every number on a list, or if you subtract each number on the list from the same number, all of the new lists produced would have exactly the same standard deviation as the original. Addition and subtraction slides values up and down the number line, but does not change any of the spacing between the numbers.
4) If you multiply the numbers on a list by any values (other than ±1), or if you raise the numbers on a list to a power, that always changes the standard deviation. Multiplying changes the spacing on the list. In particular, if you multiply each number by k, then you multiply the standard deviation by k.
5) If all the numbers on the list are the same distance from the mean, that distance is the standard deviation. For example, in the set {17, 17, 17, 23, 23, 23}, the mean = 20, and each number is exactly 3 units from the mean, so the standard deviation is 3.
6) If you do anything that “bunches the numbers together”, that decrease the standard deviation. If you do anything that “pulls the numbers further apart”, that increase the standard deviation.
7) If you include new numbers in the set — that is tricky, because adding in most numbers will change the mean of the entire set, which will change the deviation from the mean for each number on the list, which changes the standard deviation. If you include an additional number or a few additional numbers that are far away from the other numbers, this inclusion will wildly increase the standard deviation.
8) If you include two new numbers that are symmetrical around the mean, then that will not change the mean. If the distance of these two numbers from the mean is greater than the standard deviation, adding them will increase the standard deviation (there’s a larger “average” distance from the mean). If the distance of these two numbers from the mean is less than the standard deviation, adding them will decrease the standard deviation (there’s a smaller “average” distance from the mean).
9) This is an extreme instance of the last case discussed in the previous point. If you include two new numbers equal to the mean (and therefore, with a deviation from the mean of zero), of course that decreases the standard deviation, but we can say more than that. Of all possible new numbers you could include in a set, the new numbers that will most decrease the overall standard deviation of the set are new entries equal to the mean. That is the single most efficient way to decrease the standard deviation of a set by including new entries to the list.
I realize that’s a great deal of information. The more you understand how standard deviation works, the more you will understand the interconnection of these “rough and ready” facts, which will make the entire list easier to remember.
At this point, you may want to go back to the three practice questions at the beginning of this post, and see if you have any insights.
Practice problem solutions
1) This is a very tricky problem. Starting list has mean = 10 and standard deviation of 1.5.
(A) {2, 10} — these two don’t have a mean of 10, so adding them will change the mean; further, one number is “far away”, which will wildly decrease the mean, increasing the deviations from the mean of almost every number on the list, and therefore increasing the standard deviation. WRONG
B. {10, 18} — these two don’t have a mean of 10, so adding them will change the mean; further, one number is “far away”, which will wildly increase the mean, increasing the deviations from the mean of almost every number on the list, and therefore increasing the standard deviation. BTW, (A) & (B) are essentially the same change — add the mean and add one number eight units from the mean. WRONG
C. {7, 13} — centered on 10, so this will not change the mean. Both of these are a distance of 3 units from the mean, and this is larger than the standard deviation, so it increases the size of the typical deviation from the mean. WRONG
D. {9, 11} — centered on 10, so this will not change the mean. Both of these are a distance of 1 units from the mean, and this is less than the standard deviation, so it decreases the size of the typical deviation from the mean. RIGHT
E. {16, 16} — these are two values far away from everything else, so this will wildly increase the standard deviation. WRONG
Answer = D
2) Original set: Q = {5, 8, 13, 21, 34}.
Notice that Set I is just every number in Q plus 30. When you add the same number to every number in a set, you simply shift it up without changing the spacing, so this doesn’t change the standard deviation at all. Set I has the same standard deviation as Q.
Notice that Set II is just every number in Q multiplied by 2. Multiplying by a number does change the spacing, so this does change the standard deviation. Set II does not have the same standard deviation as Q.
This one is very tricky, and probably is at the outer limit of what the GMAT could ever expect you to see. The spacing between the numbers in Set III, from right to left, is the same as the spacing between the numbers in Q from left to right. Another way to say that is: every number in Set III is a number in Q subtracted from 80. Again, would be very hard to “notice”, but once you see that, of course adding and subtraction the same number doesn’t change the standard deviation. Set III has the same standard deviation.
The correct combination is I and III, so the answer is C.
3) OK, well first of all, set N has six numbers that are all the same. When all the members of a set are identical, the standard deviation is zero, which is the smallest possible standard deviation. So, automatically, N, must have the lowest. Right away, we can eliminate (A) & (B) & (C). In fact, even if we could do nothing else in this problem, we could guess randomly from the remaining two answers, and the odds would be in our favor. See this post for more on that strategy.
Now we have to compare the standard deviations of Set L and Set M. In Set L, the mean is clearly 5: two of the entries equal 5, so they have a deviation from the mean of zero, and no entry is more than two units from the mean. By contrast, in Set M, the mean is also 5, and here, every number is 3 units away from the mean, so the standard deviation of M is 3. No number in Set L is as much as 3 units away from the mean, so whatever the standard deviation of L is, it absolutely must be less than 3. That means, Set L has the second largest standard deviation, and Set M has the largest of the three. N, L, M in increasing order. Answer = D.
Hello ,
In the first question we check distance from given mean ie Mean =10 or we calculate the mean of the given list .
In the first question, you need to calculate the mean of the list given in the answer choice– the list as it would be if the two numbers from each given answer choice were added into the set. As you can see in Mike’s explanations here, some answer choices change the mean, while other answer choices create a new list that still has the same mean of 10 that the original list had. In either case, you need to carefully see what mean is derived form each answer choice, then calculate the distance from the mean for each answer choice.
Hello Mike ,
I trust you are doing good.
The 2 nd question seems to be tricky . i am not able to figure out how the 3rd option is eligible to have the same deviation as the set in the question.
Can you please explain ?
Regards
Bharti
Dear Bharti,
I’m happy to respond. 🙂 The first set is {35, 38, 43, 51, 64}. Think about the spaces between the numbers (after all, SD is about how the numbers are spaced apart). The spaces, from left to right are
_3_5_8_13_
Between the 35 and the 38 is a space of 3; between the 38 and the 43 is a space of 5, and so forth. The original set given in the prompt has this same set of spaces.
Now, do the same thing for the third set, {46, 59, 67, 72, 75}. From left to right, these spaces are
_13_8_5_3_
Notice that this is simply the mirror image of the other set of spaces. The patterns of spaces doesn’t change if we simply reflect it in a mirror, and that’s all we have done here. That’s why the SD is the same for the third option.
Does this make sense?
Mike 🙂
Dear Mike,
thank you very much for this post, it is so helpful.
The only thing I could not figure out is how exactly I could find the final “Standard Deviation”. I know, that I have to build a second set from the original set to have the deviation. But from there, I don’t know how to figure out one number to be the standard deviation.
Could you explain this with an example?
Thank you so much in advance.
Dear Heike,
I’m happy to respond. 🙂 Unfortunately, your question is very unclear. You could not figure out how to find the “final” SD — what final SD? Which problem number are you discussing? Of exactly what do you want an example?
My friend, I will remind you that one of the habits of excellence is asking excellent questions, making sure that you are meticulously clear in your own question so that what you are asking is crystal clear to someone trying to help you. Practicing this exercise will help you come to greater clarity in your own understand. Does this make sense?
Make your question clear, and I will be happy to answer. 🙂
Mike 🙂
Although these are hard to answer, these are easy 🙂
i failed question 1 :(((
These questions are highly conceptual.
Wish i could do better.
Dear Paul,
I’m happy to respond! 🙂 First of all, the three questions at the top are three HARD questions. Many of the folks preparing for the GMAT would get these wrong. Don’t be discouraged by that. Secondly, the tips in the article may be hard to remember without reviewing them, but once you master them, you will understand Standard Deviation much more easily. Finally, don’t allow yourself to get discouraged by mistakes. If you aspire to GMAT success, you need an heroic attitude. See this GRE blog that I wrote:
http://magoosh.com/gre/2013/goodigotitwrong/
I see you are a Magoosh user. Watch the lesson videos carefully, and watch the video explanations for every question you get wrong. Don’t allow yourself to be discouraged. Instead, be energetic in your efforts to improve your understanding at each stage of the process. Ambition is a fire that no cold water can extinguish! That’s the drive you need to bring to your studies!
Does all this make sense?
Mike 🙂
Hi ,
From given facts,
If every element in the data set is equal , they all equal the mean. Does this mean that the set is evenly spaced (2,4,6,8) or a set having same numbers.(4,4,4,4,4)
What i have seen in SD video is A set of identical numbers SD = 0.
Dear Anusha,
I’m happy to respond. 🙂 If all the elements are equal, all the same, then they all equal the mean & the median & the mode. Each one has a “deviation” of zero from the mean because it equals the mean, so the SD = 0 If Set A = {4, 4, 4, 4}, then SD = 0.
Equally spaced elements is a very different story. In Set B = {2, 4, 6, 8}, the mean = 5 (which is also the median). Not a single element of that set equals the mean, so they all have a nonzero deviation from the mean. The elements 4 and 6 are each 1 away from the mean, and the elements 2 and 8 are each 3 away from the mean. Now, calculating the exact value of the SD for this set is beyond what you need to know, but we can see, first of all, that the SD is definitely NOT zero, and in fact, the SD must be larger than the smallest deviation from the mean and must be smaller than the largest deviation from the mean — we can see that 1 < SD < 3.
Does all this make sense?
Mike 🙂
Hi,
For Question 2, can we use this approach:
1) Original list i.e. Set Q. Find the difference between neighboring elements
i.e. 85, 138 and so on
This gives us {3,5,8,13}
Set I {3,5,8,13}
Set II {6,10,16,26}
Set III {13,8,5,3}
As Set I and Set III have the same elements, their standand deviation is the same.
Hence Ans is C.
Dear Nik,
I’m happy to respond. 🙂 Yes, finding the differences is one approach. It happens to work here: I don’t know that this would often be a successful approach, but it’s one good trick to have up your sleeve when looking at problems of this sort.
Mike 🙂
Hey Mike
I got a question on the real exam (quant comparison) that compared the standard deviation of a list with 0. The range was given as 0 and the average of the list was 10.8
I remember in the magoosh blog, the bigger the range, the bigger the SD, and the opposite holds, so I assumed since the range is 0, then the SD was also 0
Any thoughts or material I can look at?
Dear Aisha,
I’m happy to respond. 🙂 The range and the standard deviation are both “measures of spread” — that is, they are numbers that indicate how “spread out” the numbers on a list are. It’s true that, as a general rough rule that, as one gets bigger, the other usually gets bigger, but it’s not a strict linear relationship. It certainly is true that if all the numbers on a list are the same, then both the range & standard deviation equal zero.
Two sets can have the same range and different SD’s. For example, if List #1 = {2, 2, 2, 8, 8, 8} and List #2 = {2, 5, 5, 5, 5, 8}, then both of those lists have the same range: (max) – (min) = 8 – 2 = 6; but, the first list has a SD = 3, because every point is 3 units from the mean = 5, and the SD is much smaller, because four of the six points equal the mean.
If you know all the facts on this page, then you will know absolutely everything the GMAT will ask you about standard deviation.
Does this make sense?
Mike 🙂
Mike,
This is an excellent post on SD.
Concepts are well explained through three challenging practice questions.
Just to add a point to the post……
If all the elements of the set are increased/ decreased by x%, then how this change affect mean and SD.
Let us consider the data set S = { 3, 4, 5, 6, 7, 8, 9 }
The mean and SD of the set are 6 and 2 respectively.
As an example, say each element is increased by 20%.
Then the modified data set is S’ = { 3.6, 4.8, 6.0, 8.4, 9.6, 10.8}
The mean and SD of S’ are 6*1.2=7.2 and 2*1.2=2.4.
Hence, the mean has shifted to the right on the number line and the gaps between the elements has widened.
Consider another example, say each element is decreased by 20%.
Then the modified data set is S’ = { 2.4, 3.2, 4.0, 4.8, 5.6, 6.4, 7.2}
The mean and SD of S’ are 6*0.8=4.8 and 2*0.8=1.6
Hence, the mean has shifted to the left on the number line and the gaps between the elements has reduced.
It may be noted that percentage increase/decrease is same as increase/decrease of mean and SD by a multiplying factor. In other words, if the elements are increased or decreased by x%, then mean and SD are multiplied by 1 + or – (x/100).
Thanks.
Arun,
Because of your comment, I added the last line of text to the post in item #4. I believe that is all most GMAT test takers will need to know. What you say is 100% true, but in the 1000s of practice GMAT questions I have seen, I have never seen one in which the numbers on a list are all changed by a percent and then the question asks for the new SD. Even some of the points I mention above are at the limit of what might appear on the GMAT, and I think what you share, though completely true, is beyond what the GMAT would test. Thanks for sharing — because of your contribution, I added a sentence that I think improves the post.
Mike 🙂
Mike,
This is such a brilliant post. I’ve done most of Magoosh Quant videos atleast twice, but these posts are taking my understanding to a different level. Unfortunately, I’m discovering these just few days prior to my test. Nevertheless, this article will definitely help tackle the S.D. mess that I was dreading about. The picture is so much clearer now. I can visualize better. The example on subtraction from 80 was a beauty. Thanks much Mike.
Pranay
Pranay,
Thank you for your kind words. Best of luck to you on your upcoming test!
Mike 🙂
Can we add the following to the list of rough facts If we multiply every element by a number X then, the std dev will be multiplied by the absolute value of X? Thanks!
Dear MensaMember,
Yes, that’s absolutely 100% true. I suspect that’s a level of detail beyond what the GMAT is going to expect, but nevertheless, it’s excellent to have this kind of detailed understanding of SD. Thanks for pointing this out, and best of luck to you.
Mike 🙂
Hi Mike,
Thanks for your reply. First off, I forgot to mention this last time This is an excellent excellent post by you!
I have one more quick question about a small rule I have observed about statistics tested on the GRE that I think will be really handy every GRE/GMAT taker. Here it is
” If X is added/subtracted to/from every element of a set, all 3 measures of central tendency mean, median, mode will be added/subtracted by X, whereas measures of dispersion range, interquartile range and standard deviation will be unaffected. On the other hand, if every element is multiplied by X, both measures of central tendency and dispersion will be multiplied by X”
Does this rule look good to you? Thanks!
Dear Mensa Member,
Yes, that’s a very efficient way to sum up correctly a variety of information in one fell swoop. Not everyone will be able to absorb all of that at once, but for folks who get that rule, it’s very powerful. Thanks for sharing it.
Mike 🙂
Dear Mike, Thanks! I am glad that you also think, it is a powerful rule. Happy to contribute in this small way as I have benefited tremendously by the wealth of information/insights shared so graciously by you and others here! 🙂
Dear Mensa Member,
You are quite welcome, my friend. Best of luck to you.
Mike 🙂
hi
For Q3,
is it correct if I simply compare the sum of the difference bewteen numbers in a set.
L = {3, 4, 5, 5, 6, 7}
M = {2, 2, 2, 8, 8, 8}
N = {15, 15, 15, 15, 15, 15}
L = {1,1, 0, 1, 1} => sum : 4
M = {0,,0,6,0,0} => sum : 6
N ={ 0,0,0,0,0} => sum : 0
=> N, L, M
Dear Marco,
That’s an approach that won’t always work. For example, {3, 4, 5, 5, 6, 7} and {3, 3, 4, 5, 5, 6, 7, 7} have the same “difference sum” but not the same SD. SD is a very tricky thing.
Does this make sense?
Mike 🙂
So the proper way to tackle this is first get the mean and then find the ‘difference sum”, like informal definition of SD ?
Well, one has to be flexible in one’s approach, because SD is a tricky topic, but it’s crucial to keep in mind that, when we say “standard deviation”, we are talking about “deviation” that can be regarded as “standard”, and this “deviation’ is a deviation from the mean. That’s absolutely essential to understanding what SD is.
Mike 🙂
I have one more question.
For example: 4,5,4 and 4, 5, 5, how do we know which one has bigger SD ? What method would you use ?
Marco,
They would have to have identical SD’s. One way to say it is: both are two the same, and third one different from those two. They have the exact same configuration. — Another way to say it: take the set (4, 5, 5) and subtract each number in that set from 9 — that will result in (5, 4, 4), the first set. These two sets are similar to the relationship of I & III in question #2 above.
Does this make sense?
Mike 🙂
Hi Mike,
I can’t understand your approach. Why do u ask us to subtract each number in the set from 9 (specifically)??is it because 5 and 4 sum up to 9?? even so i don’t get the logic here
Look at “Rough & Ready Facts” #3 in the article. I was looking for a way to show that those two sets were equivalent. If we can take one set, and subtract each number from a fixed number to produce a second set, those two will have the same SD.
Mike 🙂
wow!!!!!!!!!!!!!!!!!!This post is awesome.
Dear Taru,
Thank you very much for your kind words.
Mike 🙂
Hello Mike
1. Can SD be negative?
2. This is regarding Question 2.
Set III can be created by substracting each element of (5,8,13,21,34) from 80. For that to be possible we will need to multiply the above set with ‘1’ thereby creating (5,8,13,21,34). We can now add 80 to the elements and then create (75,72,67,59,46).
Wouldn’t the final SD of set III be (1)(SD) (Since we had to multiple original set with 1 to acheive the target set.
Please help me understand. Thanks a ton!
Dear KC,
I’m happy to help! Fundamentally, the standard deviation is a distance: it’s a way of averaging the distance of each data point from the mean. Of course, distance is always positive, so the SD is always positive. The sets (5, 8, 13, 21, 34) and (5, 8, 13, 21, 34) have exactly the same SD. Does all this make sense?
Mike 🙂