**Sampling and sample distributions** are the foundation of all inferential statistics. To conduct inferential statistics, you have to compare a sample to some sort of distribution.

## Sampling

The overall goal of statistics is to determine patterns represented in a *sample *that reflect patterns that may exist in the *population*. The sample is a group of participants that reflect the make up of the population. To accomplish this, several types of sampling methods are used.

The gold standard of sampling techniques is the **random sample**. The goal of random sampling is to randomly select individual participants from the population. According to logic and simulated statistics, random samples limit the degree of *bias* and help to explain the error that is inherent in all statistics.

Of course, random does not mean that you arbitrarily select individuals. Instead it takes planning. First, define the population that you want to study. Second, identify every member of the population. Third, select mebers in such a way that every member has an equally likely chance of being chosen.

Another type of sampling is a **stratified random sample**. This kind of sampling accounts for differences in the population that may affect your analysis.

For example, let’s say that you want a random sample of a high school that is 25% seniors, 30% juniors, 23% sophomores, and 22% freshmen. The best way to get a random sample that reflects these differences is to make sure that your sample has the same percentages of each class. So a 100-person sample would have 25 seniors, 30 juniors, 23 sophomores, and 23 freshmen randomly selected from their respective classes. This kind of sample gives a much clearer picture of the overall population.

## Sampling Distributions

A **sampling distribution** represents the distribution of the statistics for a particular sample.

For example, a sampling distribution of the mean indicates the frequency with which specific occur. This means that the frequency of values is mapped out. You can also create distributions of other statistics, like the variance. Below is an example of a sampling distribution for the mean

The shape of the curve allows you to compare the *empirical* distribution of value to a *theoretical* distribution of values. A theoretical distribution is a distribution that is based on equations instead of empirical data. Two common theoretical distributions are Student’s t and the F-distribution.

The benefit of creating distributions is that the empirical ones can be compared to theoretical ones to identify differences or goodness of fit for the model. That is the ultimate goal of statistics, to create an empirical model that explains patterns in the data that differ significantly from the theoretical model.

## The Takeaways

Sampling involves selected participants from a population in order to identify possible patterns that exist in the data. There are several types of sampling, but the gold standard is random sampling. Sampling distributions represent the patterns that exist in the data. These patterns are then compared to theoretical ones to determine if the patterns differ significantly from the theoretical models.

I hope that this post help clarify sampling and sampling distributions. I look forward to seeing any questions that you have below. Happy statistics!

## Comments are closed.