A distribution is a graph that shows what values of variable are more or less common in a population. Where the graph is higher, there are more people, and where the graph has a height close to zero, there are fewer people.

By far, the most famous and most useful distribution is the Normal Distribution, a.k.a the “bell curve.” It shows up *everywhere*, with an almost eerie universality. Suppose you were to measure one genetically determined bodily measurement (e.g. thumb length, distance between pupils, etc.) for every single human being on the planet, and then graphed the distribution: it would be a normal distribution. Same, for any genetically determined bodily measurement you could make on an animal or a plant, and measured it for every member of that species, it would be a normal distribution. The normal distribution is the shape of the distribution of any naturally occurring variable of any natural population. (Something like blood pressure might not be as normally distributed, because there are cultural and social factors that impinge on blood pressure – it’s not purely natural, unadulterated by culture.)

## Properties of the Normal Distribution

All normal distributions on earth, from giraffe height to ant height, share certain fundamental properties in common.

It’s important to appreciate that any Normal Distribution comes with its own “yardstick”, and that yardstick is the standard deviation. You can read more about standard deviation here. The very center of the Normal Distribution is the mean and median and mode all in one. We use the standard deviation to measure distances from the mean. If we go out a length of one standard deviation from the mean on either side,

that always includes 68% of the population, a little over two-thirds. This means that on either side, there is 34% of the population, very close to one-third: there’s 34% between the mean and one standard deviation below the mean, and there’s another 34% between the mean and one standard deviation above the mean.

If we go two standard deviations from the mean in either direction,

that always includes 95% of the population. You are somewhat uncommon if you are more than two standard deviations from the mean.

If we go out to three standard deviations from the mean in either direction, that includes 99.7% of the population, with only 0.15% (i.e. 15 people out of 10000) falling in each tail beyond this. The folks who are more than three standard deviation above the mean: they are the true outliers — the major league baseball hitters, the world famous violinists, the brilliant scientists and researchers — they truly stand out from the population at large.

If you simply remember these two numbers:

**68%**within one standard deviation of the mean (which means, 34% on each side)**95%**within two standard deviations of the mean

then will have the ability figure out any GRE Math question that address the Normal Distribution.

Hi Mike,

I had a question. Suppose it is given that the 75th percentile has a value 150 and the 95th percentile has a value 170. So how will the 85th percentile compare to 160?

Its less than 160

if there is 8 or 10 integers then what will be the quartile in each case?

such as 12345678 & 12345678910

Dear Anik,

For the set {1, 2, 3, 4, 5, 6, 7, 8}, the median, 4.5, divides the list into an upper list, {1, 2, 3, 4} and a lower list {5, 6, 7, 8}. Q1 is the median of the former, 2.5, and Q3 is the median of the latter, 6.5.

For the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, the median, 5.5, divides the list into an upper list, {1, 2, 3, 4, 5} and a lower list {6, 7, 8, 9, 10}.Q1 is the median of the former, 3, and Q3 is the median of the latter, 8.

Does all this make sense?

Mike 🙂

For the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, the median, 5.5, divides the list into an upper list, {1, 2, 3, 4, 5} and a lower list {6, 7, 8, 9, 10}.

Q1 is the median of the former, 3, and Q3 is the median of the latter, 7.

How is the latter median is 7 ???

It should be 8 Right ???

Dear Ashish,

Yes, you are quite right! 🙂 Thank you for catching that typo! I just corrected it. Best of luck to you, my friend!

Mike 🙂

good to know

Dear Anik,

I’m glad you found this helpful. Best of luck to you.

Mike 🙂

Hi Mike,

One doubt, if a set is equally spaced like { 3,3,3,3} can this be plotted through box-whisker method? or it not be possible since q1, median,q3 all are same.

Dear Akanksha,

I’m happy to respond. 🙂 The purpose of quartiles, box-and-whisker plots, etc. is to give a simple breakdown of the distribution of a large number of data points. If there’s a set of thousands or millions or billions of data points, these statistical tools are helpful to reduce the entire distribution to a few numbers. For a set with fewer than 10 elements, there is absolutely no point to finding the quartiles and etc, because you can simply see all the numbers directly. Furthermore, if every number in a set is the same, that’s simply not an interesting set. It’s much easier to describe such a list by saying “Each of the 200 elements on the list equals 7.” That’s a complete description of the list. There’s no need to use the other fancy tools unless there’s variation on the list, as there would be for any real list of data. Does all this make sense?

Mike 🙂

I am confused as to the answer and explanation for the QC question above:

Variable x is normally distributed and the values 650 and 850 are the 60th and 90th percentiles of the distribution.

Quantity A: The value at the 75th percentile of the distribution

Quantity B: 750

We are attempting to determine if quantity A is greater than quantity B, if quantity B is greater than quantity A, or if they are equal.

You said that the correct answer is A (quantity A is greater than quantity B), but later agree with a commenter that the correct answer is B (quantity B is greater than quantity A), and again, later agree with a commenter that the answer is A (because the 70th percentile is 750, meaning the 75th percentile is a value greater than 750, so quantity A is greater than quantity B).

Which is the correct answer and why?

You said that, “Therefore, the halfway percentile, the 75th percentile, has to be to the left of X = 750, in other words, has to have an X-value that’s less than 750. Answer = A”.

Later, a commenter said, “The 70th percentile will be 750. Hence, the 75th percentile is greater. Am I right?” and you confirmed that he is indeed right.

How is 750 the 70th percentile when 650 is the 60th percentile (10 away from the 70th percentile) and 850 is the 90th percentile (20 away from the 70th percentile); would that not make 850 the 80th percentile?

Could you please clarify how to arrive at the correct answer and how the 70th percentile is 750, if that is accurate? Thank you so much!

Cat,

I apologize for any confusion. This question is essentially a visualization question, and it’s confusing to try to negotiate a solution in text-only format.

The answer to the question in the form you ask it, and in the form in which it appears below, is (B). Whether similar questions appear in other forms, I do not know.

Also, apparently I misread what Svrinivas wrote when I responded — I went back and edited that response. Without advanced statistics (leagues beyond GRE math), there’s no way to know what the 70th percentile would be. All we know is the the 75th percentile is not halfway between — it must be closer to the 60th percentile.

Mike 🙂

You’re wrong. The correct answer is B

Here’s why

https://www.facebook.com/photo.php?fbid=793979250723072&set=p.793979250723072&type=1&theater

68% within one standard deviation of the mean (which means, 34% on each side)

95% within two standard deviations of the mean

What does the above tell us significantly?

Dear Siddharth,

I must say, your question is vague, and this makes it hard to answer. First of all, I don’t know whether, by “significantly”, you mean importance in general or the very technical idea of statistical significance, which is well beyond the GRE.

I’ll assume you mean the former. This is important because, periodically, the GRE will give a question involving a Normal Distribution, and it expects you to know these basic rules about it.

Mike 🙂

I shall keep the visualization bit in mind, I do have a question, it is from the OG (1e), page 331, Q9

A random variable Y is normally distributed with a mean of 200 and a standard deviation of 10

Column A: The probability of the event that the value of Y is greater than 220

Column B: 1/6

The solution says there is a chance of under 5% it would be greater 220,

From whatever I have come to understand of S.D. (all of it through this post and the one by Chris) I just wished to confirm, it is a 2.5% chance to be over 220, isn’t it?

Thanking You,

Sid

wished to confirm the exact value*

Sid: Yes, precisely — the probability of being more than 2 S.D above the mean (or more than 2 S.D. below the mean) is 2.5%.

Mike 🙂

thank you so much, Mike, Standard Deviation is finally not as daunting 🙂

Regards

Sid

You are quite welcome. I’m glad my words were helpful.

Mike 🙂

Hi Mike,

Can you explain how you arrived at 2.5%?

Dear Sid,

Well, since 95% of the distribution is *within* two standard deviations of the mean, that means that the other 5% is outside of that region, i.e. further than two standard deviations from the mean, out in the distant tails of the Bell Curve. Well, the Bell Curve is complete symmetrical, and symmetry is always a hugely important math property to appreciate. Since the Bell Curve is symmetrical, that 5% out in the tails must be exactly split in half between the left tail and the right tail, which means there is (5%)/2 = 2.5% in each tail in the region that is more than two standard deviations from the mean. Does this make sense?

Mike 🙂

Yes, it makes sense now 🙂

Thanks a lot 🙂

Sid,

You are quite welcome. Best of luck to you.

Mike 🙂

Hi,

Can you suggest an alternate approach to the problem 5 from “ETS official guide to GRE Revised”, set3 discreet questions :Hard quantitative comparison.I am unable to spot a similar problem in our practice sets.

It goes like the random variable x is normally distributed and values 650 and 850 are 60th and 90th percentiles of distribution.

quantity a: the value at the 75th percentile of distribution of x

quantity b: 750

Thanks,

praveen

Praveen: First of all, when you ask a question from the OG, please give the *page number*. This problem is p. 156 in both the OG(1e) & the OG(2e).

We know 650 is the 60th percentile — it is just above the center hump of the Bell Curve (center hump = mean = median = mode = 50th percentile). 850 is the 90th percentile, way out on the arm of the Bell Curve. The height of the Bell Curve declines precipitously as we walk from X = 650 to X = 850. Suppose we walk halfway, out to X = 750 — the question is: of the slice of Bell Curve between 650 and 850, between the 60th percentile and the 90th percentile, is more than half before or after X = 750. Well, the curve is declining precipitously in this region, so the height of the curve is *much higher* before X = 750 than after X = 750. Another way to say it is: the Bell Curve is densest toward its center. Again, considering the slice between 650 and 850, more than half will be toward the center, to the left of X = 750. Therefore, the halfway percentile, the 75th percentile, has to be to the left of X = 750, in other words, has to have an X-value that’s less than 750. Answer = A.

The question is a deep *visualization* question.

Does all that make sense? Please let me know if you have any further questions.

Mike 🙂

Makes perfect sense.

Thank you so much Mike 🙂

You are quite welcome. Best of luck to you.

Mike 🙂

thank you for the explanation, this was indeed a confusing problem, and I can only hope they do not surface often on the test

A small typo there, however, the correct choice is B

Sid:

Well, this question is in the OG, so it’s fair game for the GRE. This question is specific to the Normal Distribution, but *visualization* is a powerful strategy that can help you throughout the Quant section.

Mike 🙂

Hello Mike,

The 70th percentile will be 750. Hence 75th percentiles is greater.

Am I right?

Dear Srinivas,

We have no way of knowing (without advanced statistics) what the 70th percentile is. All we can say is the the 75th percentile will not be halfway between the 60th & 90th — it will be closer to the former.

Mike 🙂

Hello Mike,

Since the 70th percentile also has a higher height and a denser area on the bell curve graph, could we also say it’s also closer to the 60th percentile than is the 70th percentile?

Dear Ji Won,

The word “denser” is a misleading word, a word ripe for misinterpretation with respect to the Normal Distribution. I would say the best way, and perhaps the only non-technical way, to think about this has to do with the areas under the curve.

Mike

Mike, am I just re-wording your approach here or can we say something like this:

the 60th percentile will be within 1 SD of whatever the mean is. The 90th will be within 2 SD. So there are almost 3x (1 S.D. being 34% of the curve and 2 S.D. being only 14% of the curve) as many numbers under the part of the curve that represents the 50-84th percentile than under 84-98th percentile where the 850 falls into. So if we are doing averages its something like 650 + 650 +650 vs 800, which pulls the whole thing much closer to the left.

Dennis,

Yes, that’s a very good way to think about it. Best of luck to you.

Mike 🙂

Sir

regarding normal distributions is it enough to just know the 2 percentages (68,95) and its behaviour on deviating by 2 standard deviations either side?

Praneeth: Yes, those are the only two numbers you need to know from memory. I will say, though, the GRE might expect you to *do* things with those numbers. For example, what percent of the population is *between* the mean and 1 S.D. above the mean? That’s 68/2 = 34%. How much of the population is above a score 1 S.D. below the mean? Well, that’s the 34% from that place up to the mean, and then the whole 50% above the mean, so that’s 84%. Do you see what I mean? The only numbers you have to have memorized are the 68 and the 95, but you could be expected to divide those in half and add or subtract pieces to get specified regions under the Bell Curve. Does all that make sense?

Mike 🙂

Excellent explanation of the concepts. Thanks!!

Thanks for the compliments. Best of luck to you.

Mike 🙂

Thanks a lot for quick response.

Yes, the question is from a GRE prep source. But now, as you have said I won’t go in that much detail.

Thanks!

You are quite welcome. Best of luck to you!

Mike 🙂

hiii..

I was trying to solve this question :

The scores of an IQ test are normally distributed. The mean is 81 and standard deviation is 6.3. The probability that a person who takes the test will score between 68.4 and 87.3 will be _________% (round off to one digit after the decimal point)….

I just calculated that 87.3 is at 1 unit of sd from mean and 68.4 is 2 units of SD below mean

=> 34% + 34% + 14% = 82%

but the answer is 81.8%…. values taken are 34.1 and 13.6

I am confused which values to take for solving the question. Please clarify.

Neha: Is that question from a GRE prep source? So far as I know, the GRE does not expect you to do normal distribution calculations so precise that you will have to distinguish between 82% and 81.8% —- yes, on a GRE question, 81.8% might be listed as the answer, but estimating 82% would be enough to get that answer. Whatever source is advocating knowing normal distribution values to the tenth’s place is simply going over the top. That’s flamboyantly unnecessary. Stick with the values given in this post.

Mike 🙂

Shouldn’t the percentage within two standard deviations from the mean be 96% instead of 95% ?

Dear Hicham: Technically, it’s 95.44998759715%, so that’s closer to 95%. Also, 95% is a nice round number to remember.

Mike 🙂

Great Stuff. Thanks Sir!

Thank you. Best of luck to you, sir.

Mike 🙂

i want to ask if normal distribution and standard deviations are little different? i saw in books a graph related to prpbabilities distribution..what is that? is that normal distribution?

Ali: Excellent question. The Normal Distribution is a *shape*, and the standard deviation is a *number.* The Normal Distribution is a shape, a curve, that shows at what values of the variable you will find the most people. Any particular Normal Distribution is a curve with it’s own particular center (the mean) and it’s own particular spread, or width. For example, adult elephant mass has a higher mean & higher spread than adult ant mass. The standard deviation is a measure of a spread, a measure of how far the individual data points are from the mean, on average. For example, the income distribution among folks with similar job descriptions in the same company would likely have a small standard deviation — those numbers would all be relatively close together. Income distribution of everyone in a major city — business people, homeless, everyone — that would have a HUGE standard deviation, because the numbers on the list deviated so wildly from one another. You can calculate the standard deviation to find the spread of any distribution, but the standard deviation is designed to work best with the Normal Distribution — the standard deviation of an particular Normal Distribution is, as it were, the built-in yardstick that comes with that distribution. You can calculate the standard deviation of other distributions (t-distributions, F-distributions, chi-squared, etc.) but that’s getting into advanced statistics, realms far beyond what you need for the GRE. Does all this make sense? Please let us know if you have any further questions.

Mike 🙂

You are great.

thanks

Thank you.

Mike 🙂

Very Useful article

Thank you. You are quite welcome. Best of luck to you.

Mike 🙂

Hello,

Should one of the sentence parts read “below the mean”? or I am I misreading it.

“there’s 34% between the mean and one standard deviation above the mean, and there’s another 34% between the mean and one standard deviation above the mean.”

Yes, that is a mistake! Great catch. We’ll fix that. Thanks for pointing it out. 🙂