The post Sampling and Sampling Distributions appeared first on Magoosh Statistics Blog.

]]>The overall goal of statistics is to determine patterns represented in a *sample *that reflect patterns that may exist in the *population*. The sample is a group of participants that reflect the make up of the population. To accomplish this, several types of sampling methods are used.

The gold standard of sampling techniques is the **random sample**. The goal of random sampling is to randomly select individual participants from the population. According to logic and simulated statistics, random samples limit the degree of *bias* and help to explain the error that is inherent in all statistics.

Of course, random does not mean that you arbitrarily select individuals. Instead it takes planning. First, define the population that you want to study. Second, identify every member of the population. Third, select mebers in such a way that every member has an equally likely chance of being chosen.

Another type of sampling is a **stratified random sample**. This kind of sampling accounts for differences in the population that may affect your analysis.

For example, let’s say that you want a random sample of a high school that is 25% seniors, 30% juniors, 23% sophomores, and 22% freshmen. The best way to get a random sample that reflects these differences is to make sure that your sample has the same percentages of each class. So a 100-person sample would have 25 seniors, 30 juniors, 23 sophomores, and 23 freshmen randomly selected from their respective classes. This kind of sample gives a much clearer picture of the overall population.

A **sampling distribution** represents the distribution of the statistics for a particular sample.

For example, a sampling distribution of the mean indicates the frequency with which specific occur. This means that the frequency of values is mapped out. You can also create distributions of other statistics, like the variance. Below is an example of a sampling distribution for the mean

The shape of the curve allows you to compare the *empirical* distribution of value to a *theoretical* distribution of values. A theoretical distribution is a distribution that is based on equations instead of empirical data. Two common theoretical distributions are Student’s t and the F-distribution.

The benefit of creating distributions is that the empirical ones can be compared to theoretical ones to identify differences or goodness of fit for the model. That is the ultimate goal of statistics, to create an empirical model that explains patterns in the data that differ significantly from the theoretical model.

Sampling involves selected participants from a population in order to identify possible patterns that exist in the data. There are several types of sampling, but the gold standard is random sampling. Sampling distributions represent the patterns that exist in the data. These patterns are then compared to theoretical ones to determine if the patterns differ significantly from the theoretical models.

I hope that this post help clarify sampling and sampling distributions. I look forward to seeing any questions that you have below. Happy statistics!

The post Sampling and Sampling Distributions appeared first on Magoosh Statistics Blog.

]]>The post Simulation Statistics Explained appeared first on Magoosh Statistics Blog.

]]>Sample size is super important in statistics. It is hard to make a generalization about behavior from one person. For example, I like horror books, but that doesn’t mean everyone does. And sometimes it is just too impractical to run experiments to collect data, like rolling a die 500 times. To get around this problem, statisticians and students create **simulation statistics**.

Simulation statistics is using artificially generate data in order to test out a hypothesis or statistical method. Whenever a new statistical method is developed or used, there are assumptions that need to be tested and confirmed. Statisticians use simulated data to test them out.

There are several advantages to using simulated data. First, it is cheap because it uses random numbers generated rather than data that are collected. Second, it is much faster than traditional data collection, so tests can be run more quickly. Best of all, if the hypothesis or model is pretty solid, then the results of simulation statistics can approximate real results.

The best part of simulation statistics is also one of its disadvantages. It only approximates real-world results, which indicates a little grain of salt you have to consider when running the data.

Although different statistical tests require a slightly different method to generate simulation statistics, all simulation models follow the same general seven steps. Let’s take rolling a single, fair die as an example. How likely is it that I roll a six?

Before we even move the mouse on our data generator, we have to define which outcomes we expect. In our example, we can get any one of six outcomes, A = {1, 2, 3, 4, 5, 6}

In many cases in statistics, a probability can be tricky to calculate. In all cases, we have to come up with a probability of the desired outcome. Based on our question, the desired outcome is rolling a 6. To calculate that probability, I take the number of desired outcomes and divide by the number of total possible outcomes.

P(6) = 1 ÷ 6 = 0.16

In the very best statistics, randomness is actually desirable because if there is a pattern, then it sticks out. To simulate data, we come up with values that are randomly generated within our parameters. You can choose any size dataset to create, but for the sake of expediency let’s generate a measly 500 data points using a random number generator.

Now you need to observe the random numbers and record how many times the desired outcome occurs.

At this point, you have your data set and you have noted the number of desired outcomes. It is now possible to calculate the empirical probability.

In the case of your simulated data, the number 6 occurs 73 times. If you calculate the empirical probability, you get

P(6) = 73 ÷ 500 = 0.17

This is remarkably close to our theoretical probability of 0.16. This simulated finding confirms our theoretical finding. This means that you really do only have a 1/6 chance of getting a 6 when rolling a fair die.

Simulated statistics is a quick, efficient, and cost-effective method of gathering and analyzing data. There are advantages and disadvantages to using simulated data, such a the ability to only approximate real results. Though there are different simulation techniques, the overall process is the same for simulation. I hope to see any quesitons you have below. Happy statistics!

The post Simulation Statistics Explained appeared first on Magoosh Statistics Blog.

]]>The post Analysis of Variance Explained appeared first on Magoosh Statistics Blog.

]]>Analysis of variance, more commonly called ANOVA, is a statistical method that is designed to compare means of different samples. Essentially, it is a way to compare how different samples in an experiment differ from one another if they differ at all. It is similar to a *t*-test except that ANOVA is generally used to compare more than two samples.

*But John, why can’t we just do a bunch of t-tests?*

I like your thinking! But I want you to recall that every statistical method has some error associated with it. Each time you do a *t*-test, you actually compound the error. This means that the error gets larger for every test you do. What starts off as 5% error for one test can turn into 14% error for three tests! Well above the acceptable limit for most research.

ANOVA is a method that takes these little details into account by comparing the samples not only to each other but also to an overall *Grand Mean, Sum of Squares (SS),* and *Mean Square (s ^{2})*. It also compares error rates within the groups and between the groups. ANOVA tests the hypothesis that the means of different samples are either different from each other or the population.

This can be a lot to take in, so let’s take a look at some of the little details before we work on an example.

When you use ANOVA, you are testing whether a *null hypothesis* is true, just like regular hypothesis testing. The difference is that the null hypothesis states that the means of each group are equal. You would state it something like *X _{1} = X_{2} = X_{3}*. ANOVA would tell you that one or all of them are not equal.

ANOVA relies on something called the *F-distribution*. In short, the F-distribution compares how much variance there is *in* the groups to how much variance there is *between* the groups. If the null hypothesis is true, then the variances would be about equal, though we use an *F-table* of critical values in a similar way to a *t*-test to determine if the values are similar enough.

Analysis of variance compares means, but to compare them all to each other we need to calculate a *Grand Mean*. The Grand Mean, GM, is the mean of all the scores. It doesn’t matter what group they belong to, we need a total mean for comparison.

The *Sum of Squares, SS,* is what you get when you add up all the squared standard deviations. We use this value to calculate something called the *Mean Square of Treatment, MS _{treat}*, which is the sum of squares divided by the degrees of freedom in the sample (N – number of groups). It tells you the amount of variability between the groups.

The final detail that we are going to talk about is the *Error Sum of Squares, SS _{error},* which refers to the overall variance of the samples. Remember that variance tells you how precise your data is. SS

Now, that is a lot of terms, so let’s see them in action.

Suppose that you are interested in the best way to relax after stats class. After some cursory research, you settle on either reading a book or playing video games as possible choices for relaxation. But which, if either, is best? This is a case for an experiment. You would measure stress for a few students right after stats class, right after relaxing by reading a book, and right after playing video games.

Let’s do that right now! BAM! I have your data ready for you in the table below. We measured stress on a scale from 1 (low) to 10 (high) for a few students under each condition. By the way, these conditions are typically called *treatments*. I have also squared and summed some values for future use. *You’re welcome*

ANOVA determines the differences in the means by comparing the mean of squares of the treatments to the mean of squares of the errors. It uses this equation:

But these values have to be calculated from still other things! Gosh darn, there’s a lot of calculating. But that is part of the fun of statistics. Let’s calculate the MS_{treat} first.

MS_{treat} refers to the variation that occurs *between* the groups. So it is calculated using the mean of each group and the Grand Mean. It also utilizes the degrees of freedom based on the number of groups in the study. In your case, there are three groups, so there are two degrees of freedom. Let’s see

In this equation, the *t* stands for the treatment. It is the sum of the treatment means minus the grand mean then squared and multiplied by the number of participants in the sample *for that treatment*. This is how we know it compares between groups. In your case, the value is 60.67.

To get MS_{treat}, we need to take it one step further. We need to divide by the degrees of freedom for the treatment groups. In this case, there are two degrees of freedom since there are three groups. Meaning MS_{treat} = 60.67 ÷ 2 = 30.33. We will use this value later.

Now for the Mean Square Error, MS_{error}. This gives you a sense of the variability *within* each group. You will be using a form of the variance formula that uses the squared values of your measurements (hence, their inclusion in the table). The formula looks a little like

Once again, the *t* here refers to the specific treatment. N_{t} refers to the number of measurements in each treatment. Your data gives the solution 24.25 if you would like to work it out. But you’re really after MS_{error}, the amount of variation within the groups. To get this, divide SS_{error} by the degrees of freedom. The degrees of freedom for the sample in ANOVA is based on the number of groups. Degrees of freedom for error would be N – (the number of groups) or 12 – 3 = 9. You value for MS_{error} should be 24.25 ÷ 9 = 2.69.

You’ve made it through the theory, the data, and the math! Because you’re awesome! Now comes the time to run the final check to see if the means are significantly different. Now we compare the MS_{treat} to MS_{error} using the F formula from earlier.

Now you have an F-statistic! The df refers to the degrees of freedom in the numerator and the denominator, respectively. You will need all three of these values when you look up the critical value in the F-Table.

The F-table shows a distribution of critical values based on various degrees of freedoms for both MS_{treat} and MS_{error}. These critical values represent the highest ratio for which the null hypothesis should still be retained, with the usual 5% error threshold.

In the case of 2 degrees of freedom for treatment and 9 degrees of freedom for error, the critical value is 3.00. Since your experimental value is higher than that, you can conclude that the null hypothesis is *NOT* true and the means are significantly different. In your experiment, different types of relaxation DO provide different amounts of stress relief!

Although analysis of variance is a wicked awesome statistical method, there are a few things that you absolutely must keep in mind when running these analyses.

First, ANOVA results alone *do not* tell which mean is the most different. For example, your results show that relaxation is different from no relaxation, but is reading better than playing video games? To answer that you need to conduct **post hoc** tests for significant difference. There are a few, like the *Tukey Honestly Significant Difference Test*, but those are for another post.

Second, ANOVA can be a great tool for coming up with cause-effect results. But the cause-effect conclusion can only be used if the participant were randomly assigned to groups. If you are using pre-selected groups or non-random assignment (like gender) then try to avoid the cause-effect conclusion.

Third, the data should have a reasonably normal distribution. If the data is too skewed, then the variances will affect your calculation one way or another.

Finally, analysis of variance comes in many forms (like analysis of covariance [ANCOVA] and multiple analysis of variance [MANOVA]), but they all have one thing in common. Analysis of variance typically works best with categorical variables versus continuous variables. So consider ANOVA if you are looking into categorical things.

Ultimately, analysis of variance, ANOVA, is a method that allows you to distinguish if the means of three or more groups are significantly different from each other. This method was practically made for experimental setups and can yield wonderful results. It also gives valuable information about the way that one group differs from another within an experimental or quasi-experimental setup. ANOVA requires careful calculation and interpretation, but it opens up a whole new realm of research possibilities. Happy statistics!

The post Analysis of Variance Explained appeared first on Magoosh Statistics Blog.

]]>The post T-Test and F-Test: Fundamentals of Test Statistics appeared first on Magoosh Statistics Blog.

]]>Enter the famous world of **test statistics**.

The goal of a test statistic is to determine how well the model fits the data. Think of it a little like clothing. When you are in the store, the mannequin tells you how the clothes are supposed to look (the theoretical model). When you get home, you test them out and see how they actually look (the data-based model). The test-statistic tells you if the difference between them (because I definitely do *not* look like the mannequin.) is significant.

In another post, I discussed the nature of correlational and experimental research. Linear regression, multiple regression, and logistic regression are all types of linear models that correlate variables that occur simultaneously. However, *experimental* models are concerned with cause-effect models, or at least models that state a significant difference between cases.

*Test statistics* calculate whether there is a significant difference between groups. Most often, test statistics are used to see if the model that you come up with is different from the ideal model of the population. For example, do the clothes look significantly different on the mannequin than they do on you? Let’s take a look at the two most common types of test statistics: ** t-test** and

The *t*-test is a test statistic that compares the means of two different groups. There are a bunch of cases in which you may want to compare group performance such as test scores, clinical trials, or even how happy different types of people are in different places. Of course, different types of groups and setups call for different types of tests. The type of *t*-test that you may need depends on the type of sample that you have.

If your two groups are the same size *and* you are taking a sort of before-and-after experiment, then you will conduct what is called a **Dependent** or **Paired Sample t-test**. If the two groups are different sizes or you are comparing two separate event means, then you conduct a

I am a fairly introverted person. I’m so introverted that I have extreme anxiety in social situations that warrant a therapy dog by the name of Chloe. And she’s pretty adorable.

Now, a lot of people have therapy dogs in order to relieve anxiety. Let’s say that you measure people’s anxiety *without* their therapy dogs and *with* their therapy dogs on a scale from 1 (low) to 5 (high) to determine if therapy dogs do significantly lower anxiety for people like me. For the sake of convenience, you get the following data

At first glance, it seems that there is a clear difference between people’s level of anxiety with and without their therapy dogs. You want to jump to the conclusion that our model (they do make a difference) is different from the null hypothesis (they don’t). But wait, you want to have some statistical data to back that claim up. So you perform a *t*-test.

A *t*-test** is a form of statistical analysis that compares the measured mean to the population mean, or a baseline mean, in terms of standard deviation. Since we are dealing with the same group of people in a before-and-after kind of situation, you want to conduct a dependent t-test. You can think of the without scenario as a baseline to the with scenario. **

The traditional *t*-test equation looks like

The null hypothesis states there should be no difference between the two sample means. So that means μ_{1} – μ_{2} = 0 giving us

But what do you do with this number? Well, you will consult the mystical chart of *t* Table. Along the top of the table is the *probability* of error that you are willing to accept. In other words, what is the possibility that you are wrong? Along the side of the table are the degrees of freedom. In this case, you have 46 degrees of freedom because you have two groups with 24 participants each.

The *t* Table states that the critical value for 46 degrees of freedom and the 0.05% error is 2.013. Your calculated *t*-value is above that, which indicates that your means are significantly different. Based on my completely random, fictitious data, the lower mean of anxiety people show *with* their therapy dogs is different enough to be meaningful, otherwise known as statistically significant.

I guess Chloe is good for me, lol.

The case for independent sample tests is a little different. This style of test is best suited to experimental designs, or those designs that compare groups with different sets of participants. The benefit is that the groups do not have to be equal sizes. Let’s check another statistical example.

Let’s pretend for a moment that you (for some crazy reason) want to know if people are more anxious in statistics class than in another, let’s say English, class. So you find some willing volunteers and measure their heart rates during each class. It’s important to note that neither class will have the same participants. Your data looks a little like this

There is a difference, but is it enough of a difference? When you calculate the t-value and find it to be 1.92, compare this to the t-table at the 40 mark, notice it is below the critical value. This means that while there is a difference, it is not a *significant* difference.

Huh, I guess statistics isn’t too stressful after all.

The role of the t-test is to determine whether two groups are different from each other. Just remember that dependent t-tests are best used for groups that have the same participants, while independent t-tests are for groups with different ones.

*But John, what if I want to test something else? Like a model?*

That is a fantastic question!

Sometimes we want to compare a model that we have calculated to a mean. For example, let’s say that you have calculated a linear regression model. Remember that the mean is also a model that can be used to explain the data.

The *F*-Test is a way that we compare the model that we have calculated to the overall mean of the data. Similar to the *t*-test, if it is higher than a critical value then the model is better at explaining the data than the mean is.

Before we get into the nitty-gritty of the *F*-test, we need to talk about the **sum of squares**. Let’s take a look at an example of some data that already has a line of best fit on it.

The *F*-test compares what is called the **mean sum of squares** for the *residuals* of the model and and the overall mean of the data. Party fact, the residuals are the difference between the actual, or observed, data point and the predicted data point.

In the case of graph (a), you are looking at the residuals of the data points and the overall sample mean. In the case of graph (c), you are looking at the residuals of the data points and the model that you calculated from the data. But in graph (b), you are looking at the residuals of the *model* and the overall sample mean.

The sum of squares is a measure of how the residuals compare to the model or the mean, depending on which one we are working with. There are three that we are concerned with.

The *sum of squares of the residuals* (SS_{R}) is the sum of the squares of the residuals between the data points and the actual regression lines, like graph (c). They are squared to compensate for the negative values. SS_{R} is calculated by

The *sum of squares of the total* (SS_{T}) is the sum of the squares of the residuals between the data points and the mean of the sample, like graph (a). They are squared to compensate for the negative values. SS_{T} is calculated by

It is important to note that while the equations may look the same at first glance, there is an important distinction. The SS_{R} equation involves the predicted value, so the second Y has a little carrot over it (pronounced Y-hat). The SS_{T} equation involves the sample mean, so the second Y has a little bar over it (pronounced Y-bar). Don’t forget this very important distinction.

The difference between the two (SS_{R} – SS_{T}) will tell you the overall sum of squares for the model itself, like graph (b). This is what we are after in order to finally start to calculate the actual *F* value.

These sum of squares values give us a sense of how much the model varies from the observed values, which comes in handy in determining if the model is really any good for prediction. The next step in the *F*-test process is to calculate the **mean of squares** for the residuals and for the model.

To calculate the *mean of squares of the model*, or MS_{M}, you need to know the degrees of freedom for the model. Thankfully, it is pretty straightforward. The degrees of freedom for the model is the number of variables in the model! Then follow the formula MS_{M} = SS_{M} ÷ df_{model}

To calculate the *mean of squares of the residuals*, or MS_{R}, you need to know the degrees of freedom in the sample size. The degrees of freedom in the sample size is always N – 1. Then simply follow the formula MS_{R} = SS_{R} ÷ df_{residuals}

Ok, you have done a whole lot of math so far. I’m proud of you because I know that it is not super fun. But it is *super* important to know where these values come from because it helps understand how they work. Because now we are actually going to see how the *F -statistic is actually calculated!*

This calculation gives you a ratio of the model’s prediction to the regular mean of the data. Then you compare this ratio to an F-distribution table as you would the t-statistic. If the calculated value exceeds the critical value in the table, then the model is significantly different from the mean of the data, and therefore better at explaining the patterns in the data.

Test statistics are vital to determining if a model is good at explaining patterns in data. The simplest test statistic is the **t-test**, which determines if two means are significantly different. For more complex models, the **F-statistic** determines if a whole model is statistically different from the mean. Both cases are essential for telling a good model from a bad one. Happy statistics!

The post T-Test and F-Test: Fundamentals of Test Statistics appeared first on Magoosh Statistics Blog.

]]>The post Experimental Design in Statistics appeared first on Magoosh Statistics Blog.

]]>Yay!

In statistics, there are two types of research: correlational and experimental. The type of research you use determines the type of answer you’ll get for your problem.

Our dating example is correlational research. In this type of research, you look at the changes in one variable as another variable changes without imposing a change.

In our dating example, you would want to know how the frequency of dating changes as height in inches changes. When we look at the data, we may see that as the height increases, so does the approximate number of dates. There may be something about tall people. But, as with all correlational research…

*WE CANNOT ASSUME THAT ONE IS THE CAUSE OF THE OTHER*

Since we’re measuring height and dating simultaneously, there is no way to logically state that being tall will get you more dates. It could also be that more dates make you taller. We know that there is a relationship there, but it is not clear what the CAUSE of the relationship is, whatever it may be. Given that we can’t determine the cause, it is not a true experimental design.

Comic by Randall Munroe

Our second example, the effectiveness of the hard-core diet, is fertile ground for a causal research – otherwise called experimental design in statistics.

Say you’re a personal trainer with a bunch of clients that are serious about fitness and weight loss. You have a pretty fantastic workout, but you want to test out a new diet plan. So, you give one group a new hard-core diet to go with their workout. The other group only has your awesome workout.

After a few months you tally the average weight loss for each group and notice that the hard-core diet group lost more weight. Based on this setup, we can say logically that the change in diet CAUSED the difference in weight loss. Since you kept basically everything except the diet the same, the only logical explanation for the difference in the weight loss is the only thing that was different between the two groups—the hard-core diet.

This is an experimental design because we are statistically determining whether a change in one variable, called a **treatment**, causes an effect in the other variable, sometimes called the **effect**. Unlike correlational variables, which occur simultaneously, in causal experimental designs, one variable occurs before the other and (drum roll) causes the other to change.

There are numerous ways to set up an experiment: Some methods involve randomly separating participants into a group that gets changed, the treatment group, and one that does not, the control group. Other methods involve randomly selecting a pre-existing group to receive a treatment and controls—this is called quasi-experimental design.

In every case, the kicker for experimental design in statistics is that there must be at least two groups that are the same in every respect, but one group gets a change so that the researcher can compare two, potentially different, outcomes.

The post Experimental Design in Statistics appeared first on Magoosh Statistics Blog.

]]>