So you have decided to study statistics, eh? I don’t blame you. Statistics is pretty wicked awesome. But before you embark on this journey of toil and strife, ahem, I mean joy and wonder, there are a few thing that you may want to get under your belt first. The goal of this post is to help you figure out the very basics of stats so that the rest of the mountain you’ll be climbing won’t be so daunting.
The Fun Fundamentals of Statistics
Statistics can be confusing – I mean you probably wouldn’t be here if you didn’t have a question or two – because it is full of terms, theorems, assumptions, and so on that seem like a foreign language. However, everything about statistics boils down to a few fundamental ideas.
Statistics Basics: Statistical Models
First, every data set has something to say.
“But wait, John, how does data say anything?” you ask.
What a fabulous question! Well, when you set out to use data to answer a question, you are looking for patterns in the data. In statistics, those patterns are called statistical models.
Say you want to know the relationship between the number of coffees people during different times of the day. Then, after asking a bunch of people this question, you look to see if there is a pattern between them. Once you find a pattern, you have a model that helps other people understand the pattern between the number of coffees people drink and the time of day.
Statistics Basics: Population and Samples
Now, you don’t want to go through the trouble of all that time and math for just any one person. Scientists want their work to refer to everyone or everything in a given population. For example, if a chemist can make a battery out of one magnesium atom, then they should be able to make it out of any magnesium atom.
A population is the larger group that the statistician is interested in studying, while the sample is the smaller part of the population that they are able to study. When we study our coffee drinkers, we wanted to know about all people that drink coffee and the time of day (the population) even though we could only ask a few of them (the sample) about their habits.
The goal of gathering data from a sample is to make an inference about the entire population. Statistical models allow you to make these inferences in a reliable and valid way.
Statistics Basics: Mean
The simplest statistical model, upon which almost all other models rest, is the mean or average value of particular data set. The mean is like looking at a painting of pointillism. You get a grander view of the whole thing without having to focus on each individual dot. The mean serves as a summary of the data so that we don’t have to mind-numbingly pour over each discrete value.
Statistics Basics: Deviance
If the mean is a little bit wrong, then I don’t wanna be right. Well, the mean is always a little wrong. That’s because numerous measurements make up the mean. Some will be a little higher and some will be a little lower. The amount that a single measure varies from the mean is called the deviation, or standard deviation for a population or sample as a whole.
Image by David M. Lane
If the standard deviation is small (red), then the mean represents that the data is all close together. If the standard deviation is large (blue), then the values are spread out widely. The majority of statistical analyses rely on comparing means and deviations to determine the strength of a model.
Statistics Basics: Standard Error
I have trouble appreciating the differences between professional and high school basketball because I don’t understand the nuances of the game. Now, I don’t claim to understand all the nuances of statistics, but there is one in particular nuance that makes a lot of difference in analyses: standard error.
The standard error is essentially how different your sample mean is from the population mean. Remember that the mean is a statistical model that isn’t a real value in the data. Standard deviation is how different a single measure is from the mean of the sample, but standard error is how different the mean is from the true mean of the population.
Standard error is most valuable for calculating confidence intervals. Confidence intervals are the give and take for a measure and depend on how different your model mean is from the population or a second sample.
All of these fundamentals statistical models, population and sample, mean, deviance, and standard error are pivotal for the most important part of statistics: hypothesis testing, which we’ll cover in another post.