The post F-distribution Explained appeared first on Magoosh Statistics Blog.

]]>A probability distribution, like the normal distribution, is means of determining the probability of a set of events occurring. This is true for the F-distribution as well.

The F-distribution is a skewed distribution of probabilities similar to a chi-squared distribution. But where the chi-squared distribution deals with the degree of freedom with one set of variables, the F-distribution deals with multiple levels of events having different degrees of freedom. This means that there are several versions of the F-distribution for differing levels of degrees of freedom.

Each curve represents different degrees of freedom. This means that the area required for the test to be significant is different. If you are feeling mathematically adventurous, the actual equation for the F-distribution curve is

Since degrees of freedom are in the equation, it’s pretty easy to see that the curve changes as the degrees of freedom change.

You rarely have to deal with constructing an actual curve because statistical software does that for you. However, you will have to use the curve concept in certain experimental setups.

The F-test, which uses the distribution, is a matter of comparing multiple levels of independent variables with multiple groups. This is commonly found in ANOVA and factorial ANOVA.

Let’s consider that you are testing a new drug for heart disease called X. In this case you want to determine the significant effects of different dosages. So, being the great statistician that you are, you set up trials of 0 mg, 50 mg, and 100 mg of X in three randomly selected groups of 30 each. This is a case for ANOVA, which utilizes the F-distribution.

Anytime you are comparing more than two groups, you will need the F-distribution for the F-test.

The F-distribution is used for (surprise, surprise) the F-test. The F-test involves calculating a F-score based on the variances of the three (or more) levels that you are testing compared to the sample size. The actual F-score is calculated using the much simpler equation

This compares the variance *within* a group (all the 100 mg participants for example) to the variance *between* the groups (comparing the three groups). When you run this equation, you get an F-score.

To determine if this value is high enough to be significant, you compare it to an F-distribution table like this one

You basically find the value at which your degrees of freedom intersect. If your calculated value is higher than the value in the table, then your samples are significantly different. If the calculated value is lower, then the groups are not different enough to be significant.

The F-distribution is a method of obtaining the probabilities of specific sets of events occurring. The F-statistic is often used to assess the significant difference of a theoretical model of the data. Once the F-statistic is calculated, you compare the value to a table of critical values that serve as minimum cutoff values for significance. I hope this post helped to clarify some things regarding the F-distribution. I look forward to seeing your questions below. Happy statistics!

The post F-distribution Explained appeared first on Magoosh Statistics Blog.

]]>The post Measures of Position Explained appeared first on Magoosh Statistics Blog.

]]>What do you do when you’re lost? You use tools like a compass and GPS to figure out where you are and how to get where you are going. Well, in statistics there are ways to figure out where a data point or set falls. These are called **measures of position**. Once we know where a data set or model is, we can figure out what to do with it. Let’s discuss how we find out where data is and what that means.

**Percentiles **are common measures of position. To get a percentile, the data is divided into 100 regions. A specific data point will fall in one of those regions and then you assign a percentile to indicate how much data is *below* that specific data point.

For example, I recently took one of my foster children to the doctor and they measured her height (3’4″ if you’d like to know). Once they had her height, they compared it to the national average. In the case of my foster daughter, she is in about the 50% of the national data. This means that she is taller than 50% of girls her age. This means that she is above average.

Percentiles are a good way to express the measure of position for large datasets. Many national assessments, such as height and ACT scores, use percentiles as a way to convey where specific scores fall because they are easily interpreted.

Quartiles are a nifty way to determine where data fall. Quartiles essentially divide the data into four regions. The first region comprises the lowest point in the data to the median of the lower half of the data. The second quartile region is from the median of the lower half of the data to the median of the entire data set. The third region makes up the data from the median of the entire data set to the median of the upper half of the data. The final region is made up of the data from the median of the upper half to the greatest data point.

The key region is the *interquartile range*. This region represents the middle 50% of the data. Knowing which quartile a datum falls in gives you a sense of how different the data is. It is also a get way to identify outliers, the points that are excessively high or low.

Z-scores are the most amazing way to identify how a data point differs from the mean. Essentially, a z-score is a measure how much the datum or model differs from a standardized mean. Once you calculate a z-score, you can determine whether it is different enough to be significant. A z-score is calculated as

Since is in terms of standard deviations, it is possible to determine significant difference. For example, a datum of a model with a z-score of ± 1.2 means that the datum differs from the mean by 1.2 standard deviations. If the z-score is ± 2.6, then the datum or model is 2.6 standard deviations from the mean. This means that the datum or model is statistically from the mean and represents a significant result.

There are three distinctly different measures of position that you can use to determine the placement of data in a sample. Percentiles represent how much of the data is below a certain point. Quartiles are used to determine how the data falls in comparison to the medians of different sections of the data. Finally, z-scores represent how much the data differs from the mean of the population or sample. I hope that this helps clarify measures of position for you. I look forward to seeing your questions below. Happy statistics!

The post Measures of Position Explained appeared first on Magoosh Statistics Blog.

]]>The post Understanding Discrete Probability Distribution appeared first on Magoosh Statistics Blog.

]]>The opposite of a discrete distribution is a **continuous distribution**. An example of a continuous distribution would be the distribution of heights of all New Yorkers. It is possible for a New Yorker to be 70 inches tall, or to be 71 inches tall, or to be any height in between. Someone might be 70.2 inches tall, 70.25 inches tall, or 70.25478 inches tall (assuming you have a very precise measuring stick!) The point is, a height could theoretically take **any** value between 71 and 72 inches. As we’ve already seen, this is not the case with rolling a die.

This article will focus exclusively on understanding discrete probability distributions. For more about the continuous distribution, and the differences between discrete and continuous, check out this previous article of mine.

Let’s consider the possible outcomes of four consecutive coin flips. They are listed in the following table. The probability of any one *particular* outcome (HTHT, TTHH, etc.) is always (½)^{4} (see this article on independent events if you’re unsure why). We then multiply this probability by the total number of possible combinations for a given outcome to get the following:

Outcome | Possible Combinations | Probability |
---|---|---|

4 heads | HHHH | (½)^{4}=0.0625 |

3 heads, 2 tails | HHHT, HHTH, HTHH, THHH | 4×(½)^{4}=0.25 |

2 heads, tails | HHTT, HTHT, HTTH, TTHH, THTH, THHT | 6×(½)^{4}=0.375 |

1 heads, 3 tails | HTTT, THTT, TTHT, TTTH | 4×(½)^{4}=0.25 |

4 tails | TTTT | (½)^{4}=0.0625 |

The table above is a discrete probability distribution. However, to illustrate it a bit better, let’s graph the probabilities to give a visual sense of the distribution.

We can see the discrete nature of this distribution visually because there is a *jump* from the height of one bar to the next. This is because, as we know, it’s impossible to toss the coin four times and get 2.5 heads and 1.5 tails. The outcomes shown above are **exhaustive**, meaning they form a complete list of all the possible outcomes of four coin tosses.

What happens to the distribution if we toss the coin 50 times? We end up with the following distribution:

Note that as long as the number of coin flips is **finite**, we will have a discrete probability distribution, with jumps from bar to bar. But, for those of you who are familiar with the **normal distribution**, you can see that as the number of coin flips gets larger, the distribution above is approaching the **normal bell curve**.

Next, let’s look at an example and find the **expected value** of a discrete probability distribution.

John is a student in an introductory statistics class. He has an upcoming exam, and the only possible grades on the exam are multiples of 10: 10, 20, 30, etc., all the way up to 100. Let’s assume that the discrete probability distribution for the grade John will get on his next exam looks as follows. (We are assuming the probability he gets below a 60 is zero.)

Grade | Probability |
---|---|

60 | 0.1 |

70 | 0.25 |

80 | 0.4 |

90 | 0.2 |

100 | 0.05 |

Expressing this graphically, we have:

From this probability distribution, we can now answer the question: **what is John’s expected score on the next exam?** We might guess that he should get an 80, since this has the highest probability of occurring. However, this would be incorrect, as the expected value of the distribution will be a **weighted average** of all possible outcomes:

*E(X)*=Σ*x _{i}P(x_{i})*

*E(X) = 60×0.1 + 70×0.25 + 80×0.4 + 90×0.2 + 100×0.05 = 78.5*

Therefore, John’s expected score is 78.5—slightly less than an 80. This is because he has a greater probability of scoring 60 and 70 than he does of scoring 90 or 100. This skews the expected value towards the left end of the distribution.

Still have questions on understanding discrete probability distribution? Ask them below!

The post Understanding Discrete Probability Distribution appeared first on Magoosh Statistics Blog.

]]>The post What is the geometric distribution formula? appeared first on Magoosh Statistics Blog.

]]>Consider a basketball player taking a foul shot. Let’s say that his probability of making the foul shot is *p* = 0.7, and that each foul shot can be considered an **independent trial**. Making the foul shot will be our definition of success, and missing it will be failure. We’ll let *X* represent the number of shots/trial the basketball player makes before making his first basket. Therefore, *X* will be any counting number:

*Range(X)* = {1,2,3,4,…}

Now let’s calculate the probability that the basketball player mentioned above misses the first two throws, but makes the third free throw; that is, *P(X = 3)*. Because the probability that the player makes the shot is p = 0.7, then the probability of failure will be the complement *q* = 1 − *p* = 0.3.

We can calculate this probability long-hand, by multiplying the probabilities of the three independent events:

*P(X = 3)* = *P(failure)*×*P(failure)*×*P(success)* = 0.3 × 0.3 × 0.7 = 0.063

How about if the player misses the first three shots, but makes the fourth? We could calculate as follows:

*P(X = 4) = P(failure)*³×*P(success)* = (0.3)³×0.7 = 0.0189

Do you notice a pattern? With each additional failure before the first success, we multiply by another factor of 0.3, which in turn reduces the overall probability. We can summarize this trend in the following table:

(For the math buffs out there: the probabilities generated in the right-hand column are a **geometric sequence** with common ratio *q*, hence why this distribution is called *geometric*.)

We can also graph the values above to give a visual sense of the geometric distribution:

We can now generalize the trend we saw in the previous example. We have now seen the notation *P(X = k)*, where *k* is the actual number of shots the basketball player takes before making a basket. We can define it more generally as follows:

*P(X = k)* = *P(*first *k*−1 trials are failures, *k*th trial is a success)

*P(X = k)= p(1-p)*^{k-1}

Because (1−*p*) is the complement to *p*, it thus represents the probability of failure. From my previous article on complementary events, we often see (1−*p*) written as *q*. Thus the formula above becomes:

*P(X = k) = pq*^{k-1}

For a deeper look at this formula, including derivations, check out these lecture notes from the University of Florida.

Just as with other types of distributions, we can calculate the expected value for a geometric distribution. In the example we’ve been using, the expected value is the number of shots we expect, on average, the player to take before successfully making a shot. While we won’t go into the derivation here, we can define the expected value as:

*E(X) = ^{1}⁄_{p}*

To illustrate expected value, let’s consider a dice-rolling game in which you win when you roll a five, and you lose in all other cases. Our probability of success is therefore p = ⅙. We can compute expected value as follows:

*E(X) = ^{1}⁄_{1/6} = 6*

Therefore, we should expect that, on average, we’ll have to roll the die six times before we see a single five rolled.

Still have questions on the geometric probability formula? Check out our statistics blog and videos!

The post What is the geometric distribution formula? appeared first on Magoosh Statistics Blog.

]]>The post Understanding Geometric Probability Distribution appeared first on Magoosh Statistics Blog.

]]>Consider the following scenario: “A newlywed couple plans to have children, and they will keep having children until the first girl is born. What is the probability that there are zero boys before the first girl, one boy before the first girl, two boys before the first girl, and so on?”

Keep in mind that the probability of having a girl is 0.5, and that the sex of each baby is independent of the last. Thus, we should expect that as the couple has more and more children, the likelihood that each successive child is a boy becomes smaller and smaller. (The situation here is analogous to flipping a coin multiple times; because each toss is independent, we expect that with each successive toss, it’s likelier we will see a mixture of some heads and some tails, rather than a long string of only heads.) We can illustrate this with the following **geometric distribution**:

Thus, as the number of children, *n*, increases, the probability that all the kids are boys decreases. Let’s now consider this from another angle.

“Cumulative” means “adding up.” We can illustrate the situation we’ve described so far by asking the question in a slightly different way. Instead of asking “What is the probability the couple has n boys in a row?” (the above chart), let’s ask, “What is the probability that the couple has a girl on their *nth* try?” Such a question will yield the following **cumulative geometric distribution**.

We can see that as the couple continues having kids, the probability of having a girl on the *nth* try approaches 1. Now that we have an idea of what geometric probability refers to, let’s consider it in a bit more depth.

We can now define the geometric distribution a bit more formally. The geometric distribution is a **discrete probability distribution**, in that it involves a **discrete number of trials**. As with the **binomial distribution**, the outcome of any trial is *binary*, resulting in either **success** or **failure**. In the above example, success was defined as “having a girl,” but we can define success in any number of ways. Here are a few more examples of situations that could be modeled with a geometric distribution:

• The probability that a basketball player makes a free throw is *p* = 0.333. What is the probability the player makes his first free throw after n throws? What is the expected number of throws he will make before sinking his first shot?

• A gambler rolls a six-sided die, and he wins when he rolls a 3. What is the probability that he rolls a 3 after n rolls? What is the expected number of rolls he has to make before seeing a 3?

For answers to these questions, and how to solve, check out my other article on Understanding the Geometric Distribution Formula.

From the above examples, we can summarize the geometric probability as follows.

• The geometric distribution involves a discrete number of successive trials.

• Each trial is independent of the last, with only two possible outcomes, designed success and failure.

• The probability of success, *p*, is the same for each trial.

• The geometric distribution models the probability of having no success after *n* trials.

• The cumulative geometric distribution models the probability of achieving success after *n* trials.

For more practice with the geometric distribution, check out our statistics blog and videos!

The post Understanding Geometric Probability Distribution appeared first on Magoosh Statistics Blog.

]]>The post Continuous Probability Distribution Explained appeared first on Magoosh Statistics Blog.

]]>On a fretted bass, we have a **discrete distribution** of notes; a bassist can play an F or an F-sharp, but she cannot play the note in between these tones. On the other hand, a fretless bass features a **continuous distribution** of notes; not only can she play an F and an F-sharp, but she can play any tone in between. Let’s now see how this applies to probability.

Consider the roll of two standard, 6-sided dice. The outcome of the roll can be described by a **discrete probability distribution**, which looks as follows:

In the probability distribution above, just like on the fretted bass, **only certain values are possible**. For example, when you roll two dice, you can roll a 4, or you can roll a 5, but **you cannot roll a 4.5.** The fact that this is a **probability** distribution refers to the fact that different outcomes have different likelihoods of occurring. For example (as any craps player knows), a 7 is the most common roll with two dice; we see this reflected in the distribution above, as 7 has the highest peak.

Now, consider the random variable of, say, the length of a giraffe’s neck. This variable can be described by a **continuous probability distribution** because the length of a giraffe’s neck could be 4 feet, or 5 feet, or 4.5 feet, or 4.2384 feet. The point is, the length of giraffe’s neck could, in theory, take any value between zero and infinity. Of course, just like with the dice above, the probability of different values will involve different likelihoods–and the likelihood that a giraffe has a neck taller than the Empire State Building will be close to nil.

A major difference between discrete and continuous probability distributions is that for discrete distributions, we can find the probability for an **exact value**; for example, the probability of rolling a 7 is 1/6. However, for a continuous probability distribution, we must specify a **range** of values. That is to say, we cannot ask, “What is the probability that a giraffe has a neck of 5 feet?” On the other hand, we **ca** ask the question, “What is the probability that a giraffe has a neck between 4.5 and 5.5 feet?” The difference is in the type of distribution: discrete versus continuous.

For more info on continuous probability distributions, check out this page from Columbia’s PreMBA site. Otherwise, we’ll conclude by considering the normal distribution, which is the most familiar and commonly used continuous probability distribution.

Chances are, even if you haven’t heard the term “normal probability distribution,” you have probability heard the term “bell curve,” which refers to the same thing. It looks as follows:

The normal probability distribution is often an accurate assumption for a host of random variables out in the real world. For example, the heights and weights of any large adult population will be distributed normally (follow the bell curve). This means that the majority of the population has a height or weight close to the central mean (the peak of the distribution). Extreme heights and weights are rare, or have a low probability of being found (the shallow tails of the distribution). Of course, heights and weights are also other examples of **continuous variables**, since they can, in theory, take any value.

Are you still confused about discrete versus continuous probability distributions? If so, check out our statistics blog and videos!

The post Continuous Probability Distribution Explained appeared first on Magoosh Statistics Blog.

]]>The post Understanding Binomial Probability Distribution appeared first on Magoosh Statistics Blog.

]]>Let’s flesh these concepts out a bit. For example, let’s say you’re a basketball player hoping to make a foul shot. For you, success would be “you make the shot” and failure would be “you miss the shot.” In this example, each foul shot is considered a trial.

Another example involves rolling a standard 6-sided die. Let’s say that we hope to roll a five. In this case, we define success as “rolling a 5” and failure as “not rolling a 5.” In this case, each roll of the die would be a trial.

Note that however we define success and failure, the two events must be **mutually exclusive** and **complementary**; that is, they cannot occur at the same time (mutually exclusive), and the sum of their probabilities is 100% (complementary).

Generally, we define the probability of success as *p*, and the probability of failure as *q*. Because the two events are **/complementary**, *q* = 1 – *p*. In our example of the 6-sided die, our probability of success (rolling a 5) is *p* = ⅙; our probability of failure (not rolling a 5) is *q* = 1 – ⅙ = ⅚.

For the binomial distribution to be applied, each successive trial must be **independent** of the last; that is, the outcome of a previous trial has no bearing on the probabilities of success on subsequent trials. For the roll of a die, we know this to be true: just because a five rolled the last time does not change the probability of rolling a 5 on future rolls; the probability of success remains unchanged at 1 in 6.

Lastly, the binomial distribution is a **discrete** probability distribution. This means that the possible outcomes are distinct and non-overlapping. (For example, when you roll a die, you can roll a 3, and you can roll a 4, but you cannot roll a 3.5. For more on discrete versus continuous distributions, check out this other post on the normal distribution.)

Now that we’ve covered the basic definitions involved for a binomial distribution, let’s briefly summarize them, before looking at an example. A random variable *X* follows a binomial probability distribution if:

1) There are a finite number of trials, *n*.

2) Each trial is independent of the last.

3) There are only two possible outcomes of each trial, success and failure. The probability of success is *p* and the probability of failure is *q*.

4) Success and failure are mutually exclusive (cannot occur at the same time) and complementary (the sum of their probabilities is 100%; *q* = 1 – *p*).

Let’s assume we are flipping a coin 6 times. We’ll bet on heads, so success for us is “the coin lands heads” and failure is “the coin lands tails.” In this case, the probability of success and failure are both 0.5:

*p* = 0.5

*q* = 0.5

Now, how could we calculate the probability that the coin comes up heads on 5 out of the 6 trials? There are 6 possible desired outcomes, shown below:

HHHHHT

HHHHTH

HHHTHH

HHTHHH

HTHHHH

THHHHH

Because each trial is independent, the probability of any one of these outcomes occurring is (½)^{6}, or 1/64. Since each outcome is equally probable, and there are six desired outcomes, the probability of 5 out of 6 heads would be 6*(1/64) = 6/64 = 0.09. This is visualized below, in the second bar from the right:

Starting from the left, the other bars show the probability of getting 0 heads, 1 heads, 2 heads, etc., all the way up to 6 heads on the far right. The smooth line represents the normal curve. You can get a sense from this graph that *as the number of trials increases, the binomial distribution will approach the normal distribution*. To illustrate this further, let’s see what happens to the graph when we increase the number of coin flips further, up to 16 and then 160.

We can get an even clearer view here of the binomial distribution approaching the normal distribution as the number of trials, *n*, gets larger and larger. (Note that this will only be the case when the probabilities of success and failure are both equal to 0.5. When p diverges from 0.5, the peak of the distribution will skew either to the left or to the right.)

Now let’s consider the binomial distribution for 100 trials, or 100 coin flips. What if we actually wanted to calculate the probability of flipping a coin 100 times, and getting heads 52 times? You might be thinking to yourself, *calculating that must take forever*! And it would, if we approached it by listing out all the possible desired outcomes with H’s and T’s, like we did previously, when the number of trials was only 6.

Luckily, there is a faster way. How to quickly calculate binomial probabilities for large numbers of trials? We can apply the following formula, where *X* is our random variable, *n* is our number of trials, *k* is our number of successes, and *p* is the probability of success:

From combinatorics, the expression nCr expands to:

The next example will apply this formula, and it will be a bit more technical. The more general reader may feel inclined to skip to the conclusion!

Let’s go back to our previous question. What is the probability that we flip a coin 100 times, and it lands heads exactly 52 out of the 100 flips? Let’s first define the value of each term in our formula:

- Since we are flipping the coin 100 times, we have 100 independent trials, and
*n*= 100. - Since we defining success to be “the coin lands heads,” and we are calculating the probability of getting 52 heads,
*k*= 52. - As always, the probability of success, or the coin landing heads, is
*p*= 0.5.

Plugging into our formula, we get:

While knowing the formula is useful, there are also all kinds of useful software programs and websites that can perform these calculations for you. If you’re curious, check out this binomial calculator from Vassar Stats. It allows you to plug in different values of n, k, and p, and instantaneously calculates the probabilities for you.

- The binomial distribution is a discrete probability distribution used when there are only two possible outcomes for a random variable: success and failure.
- Success and failure are mutually exclusive; they cannot occur at the same time.

The binomial distribution assumes a finite number of trials,*n*. - Each trial is independent of the last. This means that the probability of success,
*p*, does not change from trial to trial. - The probability of failure,
*q*, is equal to 1 –*p*; therefore, the probabilities of success and failure are complementary.

Do you need more practice with the binomial distribution? Check out our statistics videos and blog posts here!

The post Understanding Binomial Probability Distribution appeared first on Magoosh Statistics Blog.

]]>The post Understanding Normal Distribution appeared first on Magoosh Statistics Blog.

]]>As you can see from the picture, the normal distribution is dense in the middle, and tapers out in both tails. If you’ve ever had a teacher or professor “curve” the exam grades in a class, what this means is to fit the exam scores to the bell curve, or the normal distribution. When fitting to the bell curve, the grades are centered around the **mean** score (the tallest, central point on the curve, typically signified by the Greek letter μ), which becomes the equivalent of a C grade; the rest of the scores then fall somewhere around this central mean.

Because of the shape of the distribution, the bulk of the exam scores will be found in the fatter, middle portion of the curve. Thus, when fitting to a curve, the majority of students in the class will end up scoring a B, C, or D; because of the thinner tails, fewer students will be found further out from the mean, so A grades and failing grades will occur less frequently than middle-of-the-road scores.

This is the hallmark of the normal distribution–it is a distribution where the middle, the average, the mediocre, is the most common, and where extremes show up much more rarely. Because so many random variables in nature follow such a pattern, the normal distribution is extremely useful in inferential statistics. Let’s consider an example.

To reiterate, a normal distribution can describe variables where values near the mean predominate, and extreme values are rare. Let’s take the heights of American women as an example. According to Columbia University Statistics, the average height for a woman in the US is 63.1 inches (or about 5 feet 3 inches), with a standard deviation of 2.7 inches.

Because height, like so many variables found in nature, is normally distributed, we can reasonably expect that most American women we will encounter in our lives will more likely have a height closer to 5 feet 3 inches than, say, 7 feet. For height, like all normally distributed variables, the mean predominates, and extreme values are rare. Experience confirms this: We know that giants and dwarves are much less commonly encountered than those of average or near-average height. But just how *much* more common is the middle-ground than the extremes? Let’s explore this question in a bit more depth.

Recall that **probability distributions** are visual plots of how frequently certain values occur. In the past, you may have seen **discrete probability distributions**, which are displayed as **histograms**. The following is a *discrete* probability distribution showing the probabilities of every possible roll (from 2 to 12) of two standard 6-sided dice.

Looking at the above distribution, we can see that the probability of rolling a 7 (tallest, middle bar) is 1 in 6 (right-hand axis). The reason we call this distribution *discrete* is because *only certain values are possible*. For example, you can roll a 3 or a 4, but it is impossible to roll a 3.5.

However, our previous variable of height is *continuous*, because heights *can* take any value. A woman might be 63.1 inches tall, but she might also be 63.2 inches tall, or 63.05 inches tall. There is no restriction on how fine our gradation can be; thus, the variable is *continuous*.

**The normal distribution is a continuous probability distribution function**

Now we are ready to consider the normal distribution as a **continuous probability distribution function.** Unlike with discrete probability distributions, where we could find the probability of a single value, for a continuous distribution we can only find the probability of encountering a *range* of values.

For example, using the normal distribution, we *cannot* answer the question, “What is the probability that a random woman in New York City is 63.1 inches tall?” This is because the distribution is continuous and not discrete; we cannot specify values. However, we *can* answer the question, “What’s the probability that a random woman in New York City is *between* 60.4 and 63.8 inches tall?” (Answer: 68%). Let’s look at an example.

The 68-95-99.7 Rule says that for any normally distributed random variable,

Let’s apply this to our height example. Earlier, we encountered the fact that the mean height of women in the US is 63.1 inches, and the standard deviation is 2.7 inches. According to the 68-95-99.7 rule, 68% of all women should have heights within one standard deviation, or 2.7 inches, of the mean. We can calculate this interval as follows:

63.1 ± 2.7 = {60.4,65.8}

Therefore, we expect that 68% of women in the US to have heights between 60.4 and 65.8 inches. The 68-95-99.7 rule is a useful, fast rule of thumb for determining probabilities under the curve. Note that the *entire* area under the curve equals 100%–this will always be the case for any probability distribution function, since all the probabilities for all possible values must add to 100%. For more information on calculating more precise probabilities under the normal curve, check out this post on z-scores.

So far, we’ve been talking about the normal curve as if it is a static thing. However, it might be more accurate to talk of normal *curves*, plural, as the curve can broaden or narrow, depending on the **variance** of the random variable. No matter the shape of the curve, however, three things will always be true:

Now that we know what is common to all normal curves, let’s explore what causes them to broaden or narrow. Generally, if a variable has a higher **variance** (that is, if a wider spread of values is possible), then the curve will be broader and shorter. However, if the variance is small (where most values occur very close to the mean), the curve will be narrow and tall in the middle. Check out the following graphic for a visual.

The normal distribution, or bell curve, is broad and dense in the middle, with shallow, tapering tails. Often, a random variable that tends to clump around a central mean and exhibits few extreme values (such as heights and weights) is normally distributed. Because of the sheer number of variables in nature that exhibit normal behavior, the normal distribution is a commonly used distribution in inferential statistics.

For more information and practice with the normal distribution, check out our statistics videos and lessons!

The post Understanding Normal Distribution appeared first on Magoosh Statistics Blog.

]]>