The study of statistics contains two main branches: descriptive statistics and inferential statistics. When doing research and experiments, you will often use both together, so it will be useful to first describe each branch and how they differ from one another.
Descriptive statistics involves describing and summarizing a set of data, and analyzing it for any patterns that might appear. The goal of descriptive statistics is to help visualize a particular data set, and not to draw any conclusions about the wider population from that data set.
For example, let’s say you are a college professor, and you want to visualize how your class of 100 students did on your most recent exam.
To do this, you could take all of their exam scores (your data set), and calculate various parameters that describe these data, such as the mean, median, and mode exam scores for the class. These measures of central tendency are just one of way of describing your data.
You could also calculate the range, quartiles, variance, and standard deviation of the exam scores; such information would describe the spread of the exam scores around your mean score.
To help illustrate this further, you could then construct box plots and histograms to visually show the spread of the data. (Indeed, the use of any kinds of charts, tables, and graphs, including, but not limited to, frequency tables, stem and leaf plots, and pie graphs, would all qualify as descriptive statistics.)
No matter how you choose to visualize your data, your goal is to describe the data set of your entire population (in this case, every student in your class). Thus, descriptive statistics is limited to yielding information about only those individuals you’ve sampled (that is, gathered data on).
The parameters found may not be used to then draw conclusions about other students in classes other than the one measured (that is, the broader student population). Drawing such conclusions is beyond the purview of descriptive statistics. Nonetheless, we often do need to draw such conclusions, or make inferences, about a broader population based on a smaller sample of that population. That’s where inferential statistics comes in.
In the case of our class of 100 students, those 100 students accounted for our entire population—everyone whom we wished to measure.
However, what if our goal was broader—such as measuring the mean weight of every adult in an entire country? It would be impractical and probably impossible to weigh everyone. And yet, it is still possible to calculate a close estimate of the mean weight of such a population. We can do this using inferential statistics.
Often, the size of a population is too large to measure everyone. In such cases, we need to collect data from a random sample of individuals within that population. We can then apply inferential statistics to these data to draw conclusions about the overall population. Photo by eak_kkk.
Inferential statistics relies upon gathering data on a sample of individuals within a much larger, often impossible-to-measure population (everyone you wish to know about).
In our example of estimating the mean adult weight of everyone in a country, an experimenter might select a random sample of adults from the population of interest, and then weigh everyone in that sample. The experimenter could then calculate the mean weight of his sample (using descriptive statistics!), and from that, draw an inference that the true mean weight of the entire population falls within a specified interval of values.
The reason the experimenter must use an interval of values—and not specify an exact value for the mean weight of the population—is because no sample is a perfect representation of the entire population, and thus every experiment will involve some sampling error. Thus, the results gained through inferential statistics (unlike those obtained through descriptive statistics, when you are able to measure everyone you wish to know about) will always contain some uncertainty.
In general, the larger your sample size (in this case, the more people you weigh), the more an experimenter can reduce the uncertainty in his results. As the sample size gets larger and approaches the size of the entire population, that uncertainty will approach zero—and inferential statistics (where we can’t measure everyone we’re interested in) will give way to descriptive statistics (where we can measure everyone we’re interested in).
Methods of inferential statistics
There are two main methods of inferential statistics. The first, as mentioned in the weight example above, is the estimation of the parameters (such as mean, median, mode, and standard deviation) of a population based on those calculated for a sample of that population. The estimation of parameters can be done by constructing confidence intervals—ranges of values in which the true population parameter is likely to fall.
The second method of inferential statistics is hypothesis testing also known as significance testing. Often, this involves determining whether the difference in means of two samples is statistically significant. Such testing is often used by pharmaceutical companies that wish to learn if a new drug is more effective at combating a particular symptom than no drug at all.
Because it would impossible (and unethical) to try out the new drug on every person showing the symptom in question, random samples must be used. From the experimental results, inferences can be drawn as to the drug’s effectiveness in the population at large.
- Descriptive statistics involves describing, summarizing, and visualizing a set of data through measures of central tendency (mean, median, mode), spread (range, variance, standard deviation), and graphs (box plots, histograms, pie charts, etc.) Descriptive statistics can only tell you about the individuals in your sample; it cannot be used to draw conclusions about the larger population.
- Inferential statistics is used to draw educated conclusions about a population that is likely too large to sample completely. This is done by taking a random sample of individuals within the population of interest, and taking measurements. From these measurements, various parameters can be estimated about the overall population. Because inferential statistics does not sample everyone in a population, the results will always contain some level of uncertainty. Uncertainty can be reduced by increasing the size of your sample. For more on inferential statistics, check out this overview from Purdue University.
- Confidence intervals are a tool used in inferential statistics to estimate a parameter (often the mean) of an entire population.
- Hypothesis testing is a tool used in inferential statistics to determine the effectiveness of an experimental treatment. This is done by determining if the treatment yields results that are significantly different from those obtained from a sample given no treatment at all.
To learn more about inferential statistics, check our statistics lessons and videos!