In statistics, you have to determine if the pattern you identify in the data is significantly different from no pattern at all. There are a bunch of ways to do this, but the most common is the use of probability functions. Probability functions permit you to determine the chance that your model is different. Among the several probability functions, the **F-distribution** rises to the top *a lot*. So let’s take a glance at the F-distribution.

## What is the F-distribution

A probability distribution, like the normal distribution, is means of determining the probability of a set of events occurring. This is true for the F-distribution as well.

The F-distribution is a skewed distribution of probabilities similar to a chi-squared distribution. But where the chi-squared distribution deals with the degree of freedom with one set of variables, the F-distribution deals with multiple levels of events having different degrees of freedom. This means that there are several versions of the F-distribution for differing levels of degrees of freedom.

Each curve represents different degrees of freedom. This means that the area required for the test to be significant is different. If you are feeling mathematically adventurous, the actual equation for the F-distribution curve is

Since degrees of freedom are in the equation, it’s pretty easy to see that the curve changes as the degrees of freedom change.

## When to use the F-distribution

You rarely have to deal with constructing an actual curve because statistical software does that for you. However, you will have to use the curve concept in certain experimental setups.

The F-test, which uses the distribution, is a matter of comparing multiple levels of independent variables with multiple groups. This is commonly found in ANOVA and factorial ANOVA.

Let’s consider that you are testing a new drug for heart disease called X. In this case you want to determine the significant effects of different dosages. So, being the great statistician that you are, you set up trials of 0 mg, 50 mg, and 100 mg of X in three randomly selected groups of 30 each. This is a case for ANOVA, which utilizes the F-distribution.

Anytime you are comparing more than two groups, you will need the F-distribution for the F-test.

## How to use the F-distribution

The F-distribution is used for (surprise, surprise) the F-test. The F-test involves calculating a F-score based on the variances of the three (or more) levels that you are testing compared to the sample size. The actual F-score is calculated using the much simpler equation

This compares the variance *within* a group (all the 100 mg participants for example) to the variance *between* the groups (comparing the three groups). When you run this equation, you get an F-score.

To determine if this value is high enough to be significant, you compare it to an F-distribution table like this one

You basically find the value at which your degrees of freedom intersect. If your calculated value is higher than the value in the table, then your samples are significantly different. If the calculated value is lower, then the groups are not different enough to be significant.

## The Takeaways

The F-distribution is a method of obtaining the probabilities of specific sets of events occurring. The F-statistic is often used to assess the significant difference of a theoretical model of the data. Once the F-statistic is calculated, you compare the value to a table of critical values that serve as minimum cutoff values for significance. I hope this post helped to clarify some things regarding the F-distribution. I look forward to seeing your questions below. Happy statistics!

## Comments are closed.